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ABSTRACT 


An  enumeration  algorithm  which  synthesizes  programs  from 
example  computations  is  presented.  Tne  algorithm,  originally- 
proposed  by  Alan  If.  Biermann  of  DuKe  University,  assigns  a 
labelling  of  tne  instructions  contained  in  an  example  trace 
consistent  with  producing  minimum  state  Moore  macnine 
representations  for  tne  syntnesizea  programs.  Tecnniques  for 
processing  tne  information  to  reduce  enumeration  are  given. 
Biermann's  algoritnm  is  extended  by  trace  preprocessing 
techniques  which  identify  and  generalize  conditions  on 
instruction  sequencine  in  tne  synthesized  programs  without 
tne  user's  assistance.  Tne  tecnniques  are  presented  using 
text  editing  as  tbe  domain,  but  are  general  enough  to  be 
extendable  into  other  domains. 


TABLE  OF  CONTENTS 

I.  INTRODUCTION    10 

A.  BACKGROUND ltf 

B.  AUTOMATIC    PROGRAMMING 16 

1.  General 16 

2.  Problem  Specification   witft    Natural    Language   -13 

3.  Formal    Problem   Specification 20 

4.  Input-Output    Pair  Specification 24 

5.  Example   Computations    25 

6.  A   General    Automatic   Programmer   Design   29 

C.  OBJECTIVES  32 

D.  THESIS  ORGANIZATION  33 

II.  SYNTHESIZER 35 

A.  GOALS  35 

B.  OVERVIEW  36 

1.  General  Description 36 

2.  Trace  Colin? 38 

3.  Input /Output  Trace  Representation 40 

C.  SYNTHESIS  PROCEDURE  44 

1.  Function 44 

2.  Concepts 45 

D.  SYNTHESIZER  STRUCTURE  52 

1.  Static  Processing 52 

2.  Dynamic  Processing 5? 

a.  Label  Assignment 59 

b.  Difference  Set  Resolution  64 

5 


c.      Dynamic   iouivaience 69 

6.      3aca:up/Fixup    I'd 

III.  PREPROCESSOR  74 

A.   PROBLEM  SPECIFICATION  74 

E.   DESIGN  FOR  A  CONTEXT  FREE  ENVIRONMENT  79 

1.  Overview  79 

'I.      Structure  of  trie  Condition  Preprocessor -1 

3.  Preprocessor  Data  Structures  ?2 

4.  Implementation  E£ 

C.   DESIC-N  FOR  A  CONTEXT  SENSITIVE  ENVIRONMENT  97 

i.   Overview 37 

2.  Implementation lZfeJ 

IV.  CONCLUSIONS  AND  RECOMMENDATIONS  112" 

A.  SYNTHESIZER • 111.' 

B.  CONDITION  PROCESSING  112 

APPENDIX  A:   PROGRAM  LISTING  FOR  SYNTHESIZED 114 

LIST  OF  REFERENCES  144 

BIBLIOGRAPHY  14b 

INITIAL  DISTRIBUTION  LIST  14? 


LIST    3F   FIGURES 

1.  Initialized    Sequent    for    trie    Square    Root    Problem 22 

2.  An    i'xaT!pl=    Computation 27 

3.  PSI's   Modular   Design   31 

4.  Input   Tra~e    ^2 

5.  Nondeterministic  Moore  Macnine  42 

5.  Deterministic  Moore  Macnine 43 

7.  Instruction-Coniit ion-Instruction  Triple 45 

8.  Chaining  of  Difference  Set  Relation ^7 

9.  Non-deterministic  Input  Trace  ^7 

10.  DeterTii  nis  tic  Trace 4~ 

11.  Strai^n t-line  Program 49 

12.  Minimum  State  Machine  49 

13.  Instruction  Set  Lower  Bounds 51 

14.  Typical  Input  to  Static  Processor b3 

15.  Moore  yacnine  for  Input  Trace  54 

15.  Intermediate  Trace  Table  54 

17.  TraceTabls  57 

lb.  Partial  Trace  Labelling be 

19.  Partially  Determined  Moore  Macnine  52 

22.  Trace  Table/Failure  Memory  Configuration  for  a 

Forced  Assignment  52 

21.  Trace  Table  Sntry  Showing  Arbitrary 

Assignment  Metnod  63 


22.  Nondeterministic  Input  Trace 6b 

23.  Trace  Table/Failure  Memory  Configuration 

After  Assignment  at  the  Fourtn  Level 56 

24.  Nondeterminis tic  Prefix  label  Assignment  63 

2b.  Trace  Table/Failure  Memory  71 

26.  Computation  Without  Explicit  Conditions  76 

27.  Computation  With  Explicit  Conditions  76 

2B.  Synthesizer  Action  Bl 

29.  ASCII  7e~tor ^b 

30.  Default  Hierarchy 96 

31.  Modified  Hierarchy B7 

32.  Format  of  Transition  Table  9B 

33.  Monitor  Output  90 

34.  Completed  Transition  Table  93 

3b.  Condition  for  "Time"  and  "time"  99 


ACKNOWLEDGEMENT 


He  wish  to  acknowledge  Alan  W.  Biermann  for  the  extra  help 
that  he  I'urnishel  us  while  we  were  doin^  the  research  for 
this  thesis,  and  for  the  insights  which  he  gave  us  on 
methods  of  programming  by  example. 


I.  INTRODUCTION 

A.   BACKGROUND 

Since  the  introduction  of  electronic  computing  machines, 
manual  tasts  tnat  are  mundane,  tedious  and/or  repetitious 
nave  been  considered  for  automation.  The  computer  is  ideally 
suited  for  tnis  type  work:  since  it  neither  complains  of 
boredom  nor  wanders  from  its  assigned  tasfc.  Tne  machine 
meticulously  sequences  through  a  series  of  computations  over 
and  over,  producing  answers  consistent  within  the 
limitations  of  the  hardware.  As  consistent  as  the  computer 
is  at  performing  tasics,  assigning  the  tasfcs  is  still  left  to 
the  user  of  the  system. 

Programming  tne  early  machines  was  a  difficult  chore. 
Communications  between  man  and  machine  were  only 
accomplishable  through  tne  language  of  tne  machine.  This 
machine  language  consisted  of  binary  coded  macnine 
operations.  Tne  efficient  macnine  language  programmer  had  to 
memorize  these  codes  or  xeep  a  list  of  tne  codes  close  by. 
All  control  transfer  points  had  to  be  coded  in  absolute 
macnine  addresses  wnicn  tne  programmer  calculated  by  hand.  A 
proerammmer  had  to  interpret  the  binary  representation  of 
the  machine  operations  to  determine  the  cause  of  errors  in 
programs.  There  were  no  diagnostic  messages  to  aid  tne  user 
in   isolating   errors.   The   difficulty   of   programming   in 
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machine  language  led  to  a  searcn  to  find  better  ways  of 
generating  programs.  Tne  first  step  was  tne  recognition  tnat 
the  computer  was  a  good  boottreeper,  capable  or  computing 
absolute  addresses  from  labels  and  translating  mnemonic 
representations  of  macnine  operation  codes.  Webster's  New 
Word  Dictionary,  Second  Edition,  defines  mnemonic  to  be,  "a 
system  or  tecnnique  of  improving  memory  by  tne  use  of 
certain  formulas."  Soon  programs  were  written  wnicn  would 
accept  abstract  programs  containing  mnemonics  and  labels, 
convert  tne  mnemonics  into  macnine  operation  codes  and 
translate  tne  labels  into  absolute  macnine  addresses.  Tnese 
programs  produced  executable  macnine  language  code  as 
output.  These  translation  programs  were  called  assemblers 
and  tne  data  tney  translated  were  called  assembly  language 
programs. 

Assembly  language  provided  some  automation  of  tne  manual 
tastes  associated  with  macnine  language  programming.  An 
important  convenience  of  assembly  language  is  tne 
readability  of  tne  programs  wnen  corpared  to  macnine 
language  programs.  Tne  mnencmics  convey  tne  meaning  of  their 
function  wnile  tne  labels  relieved  tne  programmer  of 
calculating  absolute  addresses  for  control  transfer  points. 
Assembly  language  provided  a  level  of  abstraction  wnicn 
allowed  programmers  to  concentrate  on  tne  programming 
problem  witnout  dealing  witn  every  atomic  macnine  operation. 
Tne  assembler  provided  bootcteeping,  address  translation   and 
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mneumonic  decoding  fast  and  efficiently.  Programmers  were 
now  capable  of  producing  more  code  in  less  time  witn  fewer 
errors  witn  assembly  language. 

Assembly  language  eased  tne  programmers  tasK  but  it 
still  couli  not  be  considered  a  panacea  for  computer-numsn 
interaction.  Assembly  language  still  required  tne  programmer 
to  maintain  control  over  many  macnine  operations  and  ne  Had 
to  provide  tne  logic  to  control  tne  flow  of  program 
execution.  Trie  instructions  used  to  perform  control 
functions  appears  as  similar  code  fragments  in  most  programs 
written  in  assembly  language.  Tnese  code  fragments  performed 
fuctions  sucn  as  controlling  brancning  decisions  and  Keeping 
count  of  loop  indices.  When  it  was  observed  tnat  common  cole 
fragments  appeared  across  a  wide  range  of  assembly  programs, 
it  was  recognized  tnat  tnese  code  fragments  could  be 
represented  as  a  single  instruction  and  tne  computer  could 
translate  tne  single  instruction  into  tne  code  fragment  it 
represented.  The  proerams  tnat  translate  tnese  complex 
instructions  are  called  compilers  or  interpeters.  Tne 
complied  or  interpeted  lane-uases  tnat  followed  assembly 
language  in  tnis  evolutionary  process  incorporated  tne 
program  fragments  as  a  single  instruction  for  tne  language. 
Constructs  sucn  as  FOR,  DO  WHILE  and  IF  THiSN  are  examples  of 
nigner  level  control  structure  implementation. 

FORTRAN  was  the  first  in  a  lone  line  of  hiener  level 
languages.   FORTRAN   differed   from   tne   otners  by  becoming 
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endeared  to  a  family  of  users  and  the  lan?ua?e  endures  today 
as  one  of  tne  nost  frequently  used  higher  level  languages. 
What  qualities  of  tne  language  produced,  tnis  popularity? 

The  FORTRAN  language  is  attributed  to  John  Backus.  Pis 
primary  goal  wnen  designing  tne  language  was  to  mate  tne 
language  resemble  the  notation  used  in  nign  school  algebra. 
Since  tne  notation  used  in  nign  scnool  algebra  was  familiar 
to  a  wide  audience,  FORTRAN  ?ave  a  friendly  appearance.  The 
language's  apparent  simplicity  is  tne  endearing  quality  of 
FORTRAN.  Some  other  language  impiementors  failed  to 
recognize  tnis  point  and  their  languages  never  received  wide 
acceptance.  ALaOL  is  an  example  of  a  powerful  language  tnat 
never  received  the  acceptance  anticipated. 

Otner  programming  languages  that  followed  added  compact 
representation  of  other  recurring  program  fragments.  Tne 
higher  level  constructs  were  not  limited  to  control 
structures  but  also  included  constructs  for  data 
manipulation  functions.  Iverson's  [lj  AFL  (A  Programming 
Language)  provided  powerful  operators  capable  of  performing 
complex  functions  such  as  matrix  multiplication  in  one 
instruction. 

This  trend  continues  today.  >1any  of  the  newer  languages 
implement  sopnisticated  and  powerful  operators  and  control 
structures.  Some  of  these  languages  are  for  a  select  segment 
of  computer  users,  intended  for  application  to  a  particular 
domain.   The   users   are   expected   to   be  familiar  with  the 
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domain,  so  tne  form  of  tne  language  should  be  familiar  to 
the  user  also.  A  problem  witn  a  domain  specific  language  is 
its  inability  to  adapt  to  otner  areas.  To  woru:  in  anotner 
area  tne  user  must  become  familiar  with  anotner  language.  A 
pnenomenon  demonstrated  by  many  computer  users  is  a 
reluctance  to  adapt  themselves  and  learn  a  new  language  tnat 
may  be  more  appropriate  for  a  given  tasfc.  Either  they  brealc 
tne  egg  with  a  sledge  hammer  or  dig  tne  well  with  a  spoon. 
When  required  to  use  a  new  language,  the  user  will  lively 
use  only  a  small  subset  of  tne  language  tnat  is  capable  of 
doing  the  job.  Worst  than  using  only  a  subset  of  the 
language  features  is  tne  tendency  to  bring  old  programming 
styles  applicable  to  tne  old  language  into  tne  new  language. 
The  point  that  is  to  be  made  is  that  learning  a  new 
programming  language  is  a  nard  chore  and  is  avoided  wnenever 
possible. 

Another  direction  wnicn  tne  automation  of  programming 
tasfcs  has  taKen  is  the  development  of  a  programming 
environment.  A  programming  environment  automates  some  of  tne 
manual  chores  by  providing  the  user  with  aids  that  assist 
him  in  constructing  programs.  The  environment  includes  a 
programming  language,  an  interactive  syntax-directed  editor 
and  an  on-line  debugger.  The  editor  provides  syntax  error 
diagnostics  while  tne  programmer  is  creating  tne  source 
file.  The  programmer  is  forced  to  correct  the  syntax  error 
immediately  before  tne  editor  will   allow   nim   to   continue 
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proerramminer.  Tne  error  snould  be  readily  apparent  to  tr.e 
programmer  because  it  is  in  tne  latest  input.  Tne  on-line 
debufffi-er  allows  tne  programmer  to  actively  test  nis  program, 
nalt  execution,  cnectc  tne  value  of  variables,  ctangp  tne 
value  of  variables  or  cnange  tne  code  itself.  Program 
environment  systems  may  even  allow  tne  programmer  to  switcn 
from  tne  tne  editor  to  tne  on-line  debugger  and  bacfc  at  any 
time.  A  programming  environment  can  be  summarized  as  a 
friendly  interface  utilizing  an  intelligent  editor  waich  can 
recognize  syntax  errors  in  tne  associated  programming 
language  and  one  tnat  contains  otner  interactive  programming 
tools. 

Programming  nas  been  called  an  art  form  requiring 
intellectual  creativity.  Tne  automation  of  intellectual 
behavior  is  a  field  of  study  witnin  Computer  Science  called 
Artificial  Intelligence.  Tne  study  of  tne  automation  of 
programming  tasKs  whicn  require  human-li^e  reasoning  is 
called  Program  Syntnesis  or  Automatic  Programming.  It  is  net 
our  intention  to  provide  a  definition  of  intelligent 
benavior  for  a  macnine  since  mere  is  considerable 
disagreement  even  among  tne  experts.  However,  we  note  that 
tne  goal  of  researcn  in  automatic  programming  is  tne  same 
goal  tnat  led  to  all  tne  advances  in  programming  ian^ua^es. 
Informally,  tnis  goal  is  to  mate  tne  interaction  between  man 
and  computer  as  painless  as  possible.  Tnat  is,  painless  for 
tne  man  but  not  necessarily  for  tne  computer.   Dijtstra   [2J 


lb 


objects  to  our  automation  of  programming  by  claiming,  He 
should  not  automate  programming  even  if  we  can,  Decause  it 
would  tane  away  our  enjoyment  of  the  tasK."  We  note  tnere 
are  those  wno  may  require  the  use  of  computer  services  t.iat 
nave  neitner  tne  time  nor  inclination  to  obtain  the  required 
education  to  do  that  chore.  These  Include  professions  such 
as  lawyers,  pnysicians,  and  even  tneoreticai  pnvsicists.  We 
assume,  if  programming  becomes  fully  automated,  the 
programmers  will  then  turn  their  attention  toward  other 
creative  and  stimulating  pursuits.  R.  Hamming  nas  said,  "The 
purpose  of  computing  is  insight  not  numbers." 

Many  on-going  efforts  are  aimed  at  providing  better 
systems  for  tne  user  so  he  may  create  programs  faster,  with 
less  errors  and  witn  less  effort.  Tne  nistory  of  programming 
language  development  has  snown  that  automation  of  many 
programming  tasts  is  feasible.  How  mucn  more  of  tne 
programming*  tasss  can  be  automated?  What  would  be  considered 
the  ultimate  system  for  producing  computer  programs? 

B.      AUTOMATIC  PROGRAMMING 
1 .   General 

Program  synthesis  or  automatic  programming  is  a 
researcn  topic  concerned  witn  tne  development  of  systems 
that  provide  more  and  more  automation  of  the  programming 
process,  particularly  those  tastes  requiring  human-lifce 
reasoning.  Tne  goal  is  not  to  create  systems  that  program 
themselves,  but  to  create  systems  which  can  construct,  under 
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the  direction  of  a  user,  programs  that  can  perform  some 
function  ne  desires.  Tnese  systems  must  be  easy  to  use,  easy 
to  learn,  and  increase  the  efficiency  of  tne  user.  Tne  users 
of  tnese  systems  will  no  longer  ne  restricted  to  tne  few 
computer  professionals,  but  will  include  otner  professional 
fields  as  well  as  non-professionals.  Automatic  programming 
systems  are  to  interact  wi tn  tne  user,  recognize 
requirements,  and  tnen  synthesize  a  correct  program  tnat 
satisfies  tne  requirements. 

Two  questions  arise  in  tne  researcn  on  automatic 
programming.  First,  wnat  is  tne  form  of  tne  interaction 
between  tne  user  and  tne  system?  Tnis  question  is  called  tne 
specification  problem  because  it  is  concerned  with  issues 
relating  to  now  tne  user  is  to  inform  tne  system  of  nis 
requirements.  Tne  second  question  is,  given  a  specification 
metnod,  wnat  syntnesis  tecnnique  is  available  to  be  applied 
tnat  will  transform  the  specification  into  an  appropriate 
program.  Tne  tecnnique  used  for  syntnesis  is  often  dependent 
upon  tne  form  of  tne  problem  specification  and  most  of  tne 
projects  involving  automatic  programming  consider  botn 
problems  together.  It  nas  been  proposed  by  Green  [3J  tnat 
tne  two  questions  snouid  be  separated  witn  researcn 
proceeding  concurrently  on  botn  problems.  He  proposes  tnere 
is  a  standard  intermediate  representation  of  tne  problem 
specification  which  would  permit  interaction  between  tne  two 
problems. 
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Four  tecnniques  nave  oeen  proposed  for  trie 
specification  problem  whicn  dominate  tne  literature  on 
automatic  programming.  Sacn  of  tne  proposed  techniques  of 
problem  specification  introduce  a  different  approacn  to  tne 
syntnesis  problem.  Tne  four  specification  techniques  can  be 
categorized  as  follows: 

1.  Natural    Laneua^e. 

2.  Formal    Problem   Specification. 

3.  Input-output   Pairs. 

4.  Example   Computations. 

Each  of  these  specification  tecnniques  will  be  dicussed  in 
tne  following  subsections  and  tne  reiationsnip  to  a 
synthesis  approach  will  be  discussed. 

2.   Problem  Specification  with  Natural  Language 

A  visionary  approacn  to  the  specification  problem  is 
the  use  of  natural  language.  Natural  language  provides  a 
fast,  comfortable  method  of  communication  wnicn  is  already 
understood  by  numans.  Implementation  of  a  natural  lan?ua?e 
understanding  system  nas  proven  to  be  a  very  difficult 
problem  (Glass  [4] ) . 

Two  forms  of  natural  language  are  tne  spoicen  form 
and  the  written  form.  Understanding  spoken  language 
increases  tne  degree  of  difficulty  because  tne  communication 
is  in  the  form  of  audio  waves.  Once  the  audio  input  is 
captured,  it  must  be  converted  into  another  form  for  further 
syntactic   and  semantic  analysis.  The  reader  will  note  tnat 
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once  the  audio  input  has  been  captured  ana.  converted  ttie 
problem  of  written  and  spoKen  language  becomes  tne  same. 
That  is,  tne  internal  representation  of  tne  spofcen  and 
written  word  can  be  tne  same  and  tne  problem  becomes  one  of 
inferring  meaning  from  tne  representation.  Future  advances 
in  voice  understanding  nardware  can  be  expected  and  tnese 
advances  may  be  expected  to  find  tneir  way  into  use. 

A  complete  natural  language  understanding  system 
would  be  expected  to  be  able  to  understand  all  grammatically 
correct  sentences.  However,  natural  languages  do  net  n.ave 
finite  grammars.  This  complexity  implies  a  complete 
understanding  system  cannot  oe  implemented.  However,  a 
system  capable  of  understanding  a  subset  of  natural  language 
can  prove  useful  in  specific  domains.  Early  examples  of 
programming  tnrougn  natural  language  dialogue  is  presented 
in  a  survey  by  Reidorn  [5].  Current  wort  on  understanding 
natural   language   may   be  found  in  Eiermann  [5] ,  and  Walter 

[71- 

In  conclusion  natural  language  understanding  is  a 
difficult  problem  that  can  be  solved  only  in  limited 
domains.  The  use  of  natural  language  in  programming  has  been 
shown  to  be  possible  by  Heidorn  [bj  ,  and  by  Eiermann  [6J  in 
limited  domains.  The  systems  developed  up  to  today  nave  been 
experimental  systems  and  tne  results  will  aid  in 
understanding  tne  problem.  Natural  language  programming 
systems   will   not   be  available  for  industry  for  at  least  a 
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decade.  Finally,  we  present  tne  example  fiiermann  [6j 
describes  as  a  natural  language  specification  for  a  problem. 
Tnis  example  is  quoted  from  nis  paper  on  natural  language 
programming.  Its  intent  is  to  give  a  feel  for  programming  in 
natural  language.  Tnis  example  does  not  specify  tne 
algorithm  that  is  to  be  used  although  a  natural  lan^ua?e 
programming  system  would  be  capable  of  accepting  such  a 
specification. 

"Wnen   I   as*   for  a   status   report   on  a 

doctorial  student,  eive  me  his  or  her   year 

in   grad   scnool,  source   and   amount   of 

financial  support,  and   wnich   core   exams 

have  been  passed.  If  the  student  has  be^un 
a  tnesis  give  ne  tne  advisor  and  tnesis 
topic." 

3.   Formal  Problem  Specification 

The   second  technique  is  formal  specification  of  tne 

problem.  As  the  name  implies,  the  input  is  in  a   more   rigid 

structure   tnan   natural  language.  This  technique  allows  tne 

user  to  convey   tne   benavior  ne   desires   tne   syntnesized 

program   to  have  without  specifying  the  algorithm  that  is  to 

be  used.  Smith  [9J  gives  tne  following  definition   for   the 

form  of  a  formal  specification  of  a  problem  A. 

"a(x)  =  z  such  that  z  c  S  S.  P(z,x)  wnere  x  c  D  & 
I(x)  where  D  and  S  are  the  input  and  output  data 
types  respectively,  and  I  and  P„are  tne  input  and 
output  conditions  respectively." 

An   example   of  a  formal  problem  specification  for  a  program 

to  compute  the  integer  square  root  of  a  nonnegative   integer 

n  may  be  found  in  Manna  and  'Valdinger  [9]  . 
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"sqrt(n)  <==  FIND  z  SUCH  THAT 

integer(z)  S.  2**2  =<  n  <  (z  ^  1  )  **  2 
WHERE  inteffsr(n)  6  0  =<  n* 

In  tne  above  example  n  is  an  element  of  tne  input  data  type, 

z  is  an  element  of  the  output  lata  type,  sqrt  is  the  problem 

name,   integer(n)   &   0   =<   n   is   tne  input  condition,  and 

integer(z)  &  z**2  =<  n  <  (z  +  1)  **  2  is  tne  output  condition. 

Formal  urobiem  specification  and  its  application  to 
tne  program  syntnesis  problem  can  best  be  explained  tnrougn 
examination  of  tne  wort  by  Manna  and  tfaidineer  [9j ,  Manna 
and  Waldinger  [10J  ,  and.  Smitn  [BJ  .  Altnougn  all  of  tne  worn 
is  similar  in  that  the  formal  specification  is  changed  into 
an  appropriate  program  by  some  form  of  rewrite.  It  is 
valuable  to  differentiate  the  approaches  by  their  rewriting 
metnods . 

Tne  first  example  is  tne  system  of  Manna  and 
Waldinger  [3J  .  Tneir  system,  called  a  deductive  approacn, 
converts  trie  formal  specification  into  a  program  in  some 
target  language.  Tneir  approacn,  "combines  ternniques  of 
unification,  mathematical  induction,  and  transformation 
rules  into  a  single  system."  The  following  is  an  brief 
explanation  of  this  conversion. 

A  structure  is  needed  to  contain  initial  and 
intermediate  results  of  the  conversion  process.  Tnis 
structure  is  call  a  sequent.  The  sequent  is  a  tableau 
containing  two  lists.  The  first  list  is  a  list  of  assertions 
and  the  second  list  is  a  list   of   goals.   Each   element   in 
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eitner  list  may  nave  an  output  expression  associated  witn 
it.  Figure  1  represents  a  sequent  as  a  table.  Each  row  in 
tne  table  iay  contain  eitner  an  assertion  or  a  goal  but  not 
both.  Figure  1  is  tne  initial  sequent  for  tne  integer  square 
root  problem  given  above.  Tne  input  condition  nas  been 
placed  in  the  assertion  list  and  the  output  condition  placed 
in  the  goal  list.  Tne  output  variable  is  associated  witn  tne 
output  condition  in  tne  output  expresssion  column.  Tnis 
initiation  action  assumes  the  input  condition  is  true  and  a 
searcn  is  attempted  for  tne  trutn  of  tne  goal  or  output 
condition. 


sqrt(n)  <==  FIND  z  SUCH  THAT 

integer(z)  and  zvv2  =<  n 

and  n  <  (z+l)  *'*  2 

WHERE  integer (n)  and  K   =<  n 


Assertions    Goals          Output 
!             I            !    s  q  r  t  ( n )     ! 

!  integer(n)   !            !              ! 

I    and        j 

i  0  =<  n      !           !             ! 

i  integer(z)  '              ! 
!    and      I              ! 
!  z**2  =<  n   !      z 

and      !              ! 
!           !  n  <  (z+l)   !            ! 

Figure  1.  Initialized  Sequent  for  the  Square  Root  Problem 
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During  tnis  searcn  if  tne  sequent  ever  contains  a  row  wnere 
the  assertion  can  be  trivially  snown  to  be  false  or  tne  ?oai 
snown  to  be  true  anl  if  tne  output  expression  for  tnat  row 
contains  only  primitives  from  tne  target  lan^ua^e  tnen  tne 
output  expression  is  taten  as  tne  desired  synthesized 
pro?am. 

Once  tne  tableau  is  initialized,  tne  system's 
deductive  rules  are  applied  to  tne  assertions  and  goals,  Tne 
application  of  these  rules  will  cause  tne  creation  cf  new 
assertions  and  eoals  and  associated  output  expressions.  Tne 
rules  may  then  be  applied  to  the  new  goals  and  assertions 
until  tne  condition  for  a  program  is  satisfied.  The 
application  of  the  rules  chanee  th  entries  in  the  tableau 
without  changing  the  meaning  of  tne  tableau.  We  recommend 
that  the  interested  reader  review  the  original  »ctx  for  a 
description  of  the  rules  and  their  application. 

Tne  attraction  of  tnis  tneorem-provinff  tecnnique  is 
that  the  resulting  program  can  be  proven  correct  by  the  sarre 
steps  used  to  create  it.  Currently  tnere  is  not  a  running 
implementation  of  tnis  tecnnique.  One  of  tne  implementation 
questions  is  determining  what  rule  to  apply  at  eacn  step  in 
the  synthesis  process.  This  problem  can  be  viewed  as  a 
search  through  ail  possible  sequences  of  rule  applications. 
This  searcn  space  may  become  astronomical  for  any  relatively 
complex  program  since  it  may  require  hundreds  of  rule 
applications.  tfnat  is  needed  is  a  mecnanism  tnat  can  control 
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the  search  in  a  reasonable  fashion.  The  form  of  control  may 
be  neuristic  in  tnat  tnere  is  a  feel  for  wnere  a  rule  snculd 
be  applied.  If  this  intuitive  feel  can  be  quantized,  tnen 
this  technique  may  become  practical. 

Earlier  worn  by  Manna  and  Waldinger  [12]  on  tne 
DEDALUS  automatic  programing  system  also  required  formal 
problem  specifications.  Tne  DEDALUS  system,  an  implemented 
automatic  programming  system,  utilized  only  transformation 
rules.  A  tranf ormation  rule  simply  rewrites  a  portion  cf  tne 
specification  into  another  equivalent  form.  The  continuous 
application  of  these  rules  would  eventually  result  in  a 
program  in  the  target  language. 

4.  Input-Output  Pair  Specification 

Input-output   pairs   is   a   method   of   describing  a 

problem  witn  examples  of   input   and   output   behavior.   For 

example,   if  someone  wanted  to  describe  a  program  to  compute 

tne  Fibonacci  numbers  tnen  ne  could  supply  the   input-output 

pairs. 

(1,  l) 
(2.  3) 
(3,  b) 
(5,  9) 
(8,13) 

The  goal  of  a  syntnesizer  system  is  to  determine  tne 

desired   program  from   the   exanples   of   the   input-output 

behavior.  One  approach  is  to  enumerate  all  possible  programs 

in  the  target  language  in  order  and  test   each   program   for 

tne   desired  benavior.  Tnat  is,  test  each  enumerated  program 
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by  giving  it  tne  input  from  earn  or  trie  examples  ana  see  if 
tne  program  will  give  tne  associated  output.  Tne  enumeration 
will  produce  tne  correct  program  at  some  point  but  you 
cannot  determine  if  an  arbitrary  program  can  produce  tne 
desired  benavior  (see  Mermann  [  1 1 J  >  .  Tnerefore,  tne 
following  tneorem  is  given  by  Biermann,  "Tne  programs  for 
tne  partial  recursive  functions  cannot  be  generated  from 
sample  of  input-output  benavior."  A  large  class  of  programs 
may  be  inferred  from  examples  of  input-output  pairs  provided 
they  belong  to  tne  class  of  programs  wnere  tne  halting 
problem  is  decidable.  Smitn  [12J  and  Summers  [13j  nave 
looted  at  tne  syntnesis  of  LISP  programs  for  example 
input-output  pairs.  It  nas  been  snown  that  a  restricted 
class  of  LISP  programs  can  be  synthesized  from  example  pairs 
without  enumeration  over  tne  class.  The  reader  is  invitee  to 
review  Biermann  [14 J  and  Gold  [15J  for  tneoretical 
background  information. 

5.   Sxample  Computations 

Program  specification  using  example  computations 
allows  more  information  to  be  obtained  from  tne  user.  An 
example  computation  is  a  sequence  of  instructions,  without 
an  explicit  control  structure,  which  the  user  provides  tie 
system  in  order  to  describe  the  behavior  he  wants  from  a 
program.  Examples  are  a  good  communication  method  whim 
people  use  to  describe  new  concepts  or  explain  new 
processes.   To  describe  a  problem  to  the  computer  the  user 
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uses  tne  available  instructions  ana  provides  an  example  of 
wnat  he  wants  lone.  Figure  2  snows  an  example  computation 
tnat  demonstrates  now  to  compute  tne  first  Yd  Fibonacci 
numbers . 

In   Figure  2  tne  two  operand  instructions  (MOV,  ADD) 

perform  tne  action  on  the  two  operands  and  leave  the  result 
in  tne  first  operand.  For  example,  if  A  =  2  ani  B  =  3  then 
ADD  A,B  would  result  in  A  =  b  and  B  =  3.  All  of  tne 
instructions  perform  action  on  some  variables  execpt  for  tne 
START,  HALT,  and  NOTE  instruction.  START  and  HALT  flag  tne 
begin  and  enl  of  tne  program  respectively.  The  NOTE 
instruction  is  providing  information  on  tne  reason  for  tne 
execution  of  tne  next  instruction. 

This  method  of  specification  depends  on  the  user  to 
supply  more  information  about  tne  problem,  including'  tne 
algorithm  to  be  syntnesized.  Tne  algorithm  is  implicitly 
defined  by  tne  example  computation  that  is  given.  This 
specification  technique  snould  be  contrasted  with  tne 
previous  tecnniaues.  Note  that  the  formal  specification  and 
tne  input-output  pair  specification  only  required  tne  user 
to  specify  tne  desired  benavior  witnout  specifying  tne 
algorithm.  Tnus  it  can  be  claimed  tnat  these  two  methods 
intentionally  ignore  information  tnat  tne  user  nas,  assuming 
that  most  users  have  an  idea  of  the  form  of  the  algorithm. 
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Figure   2.    An    Exa-nple    Computation 
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Toe  primary  contributor  to  tne  understanding  of 
program  synthesis  nas  been  Alan  rf.  Biermann  (see  Biermann 
and  irishnaswamy  [16J  and  Biermann,  Baum  and  Petry  [lVJ  ) .  In 
particular,  Biermann  [16j  provides  a  formal  definition  of  an 
algoritnm  tnat  will  syntnesize  programs  from  example 
computations.  The  alfforithm  and  variations  nave  provided  tne 
basic  structure  upon  which  tnis  tnesis  nas  been  developed. 
Briefly,  tne  algorithm  identifies  tne  conditions  tnat  may 
nave  inadvertently  (or  purposely)  been  left  out  of  tne 
computation.  A  condition  is  a  predicate  as  defined  in 
predicate  calculus.  Tnat  is,  an  entity  for  which  a  trutn 
value  may  be  measured.  Once  tne  omitted  conditions  nave  been 
inserted,  tne  algoritnm  finds  a  labelling  for  tne 
instructions  sucn  that  a  program  witn  a  minimum  number  of 
instructions  is  produced.  To  explain  this  labelling,  assume 
the  instruction  ADD  A,B  appears  in  three  different  locations 
in  an  example  computation  (see  Figure  2).  Suppose  it  was 
icnown  that  there  has  to  oe  two  occurrences  of  tne 
instruction.  Then  two  of  tne  instructions  could  be  labeled 
witn  a  l  and  tne  otner  instruction  labeled  witn  a  2  to 
indicate  that  the  instruction  labeled  2  is  different  from 
tne  instructions  labeled  l.  Finding  the  labels  for  tne 
instructions  in  the  example  computations  requires  an 
enumeration  search  of  all  possible  labellings.  The  labelling 
selected  is  tne  first  labelling  tnat  produces  a  program  tnat 
is  deterministic. 
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This  algoritnm  is  complete  and  the  synthesized 
programs  are  sound.  Completeness  means  tnat  tne  algorithm 
can  synthesize  every  possioie  program.  Soundness  mean  tnat 
the  synthesize  program  will  correctly  execute  the  example 
used  to  construct  it.  A  disadvantage  of  tnis  synthesis 
method  is  the  algorithm  is  an  enumeration  search  and  in  the 
worst  case  will  require  exponential  time  on  tne  length  of 
the  example  computation  to  find  a  solution.  Techniques  nave 
been  developed  to  speed  up  this  search  that  will  produce 
satisfactory  response  for  most  praticai  programs, 
b.  A  General  Automatic  Programmer  Design 

.Before  leaving  tnis  section  on  automatic  program  we 
wish  to  discuss  a  design  for  an  automatic  programmer  that 
uses  at  least  two  of  tne  specification  tecnniques.  Tne  name 
of  the  system  is  PSI  and  was  designed  by  a  group  of 
researchers  at  Stanford's  Artificial  Intelligence 
Laboratory.  The  researcn  effort  was  headed  by  Cordeli  Green 
[3] .  Green  has  presented  a  high  level  design  of  an 
autoprogrammer  tnat  identifies  some  of  tne  more  important 
areas  that  need  further  researcn.  Green  admits  tnat  the 
design  was  an  effort  to  focus  attention  on  some  of  tne 
sub-areas  of  tne  overall  synthesis  problem.  His  modular 
design  does  focus  attention  on  different  aspects  of  tne 
problem.  The  design  decision  to  split  tne  overall  problem 
into  two  main  sub-problems  of  acquistion  and  syntnesis  is  of 
particular   interest.   This   design   choice   allows   wort  to 
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proceed  concurrently  on  two  nard  problems  witn  tne  interface 
between  tne  problems  o=ing  some  intermediate  representation 
of  tne  problem. 

PSI  is  a  inowledge-based  program  understanding 
system  organized  as  a  collection  of  interacting  modules. 
Figure  3  details  tne  high  level  modular  design  of  tne  PSI 
system.  Tne  PSI  design  divides  tne  system  into  two  groups. 
The  acquisition  group  interfaces  with  the  user  and  collects 
tne  specification  given  by  tne  user  wniie  tne  syntnesis 
group  produces  a  program  in  some  target  language  that  meets 
the  user's  requirements.  Communications  between  the  two 
major  groups  is  tnrougn  an  intermediate  representation 
called  the  program  model.  The  goal  of  tne  acquisition  group 
Is  to  accept  tne  user's  specification  by  eitner  natural 
language  dialogue  or  by  traces,  and  pressnt  a  unified  entity 
to  tne  synthesizer  group.  Tne  implementation  of  the 
synthesizer  group  is  then  simplified  because  of  tne 
consistent  representation  it  receives.  Since  the  user's 
input  is  converted  into  an  intermediate  representation  tnat 
is  supplied  to  the  synthesizer  group,  the  user  is  free  to 
switch  from  one  specification  tecnnique  to  anotner  during 
program  specification. 

The  overall  interaction  with  tne  user  is  meant  to  be 
through  natural  language  dialogue.  Since  natural  language 
understanding  is  not  currently  within  the 
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state  of  tne  art,  the  system  must  interact  in  a  subset  of 
natural  language  limited  to  a  particular  domain. 

Tne  system-user  interaction  is  to  appear  as  natural 
as  possible.  Tne  system  nas  been  designed  to  include  a 
mixed-initiative  dialogue  capability  which  means  tne  user  or 
tne  computer  can  assume  tne  dominant  communication  role  at 
different  times  luring  tne  discourse.  Tnis  allows  the  user 
to  provide  as  mucn  Knowledge  as  ne  can  to  nelp  tne  synthesis 
process  and  allows  the  computer  to  assist  tne  user  by  asking 
questions  or  providing  responses.  The  system  develops  a 
current  model  of  tne  user  and  a  model  of  tne  context  tnat 
assists  the  system  in  determining  when  to  assume  the 
initiative  and  what  questions  to  asK  tne  user. 

A  partial  implementation  was  completed  in  197b  that 
included  tne  syntnesis  expert  and  tne  efficiency  expert  from 
the  synthesis  eroup.  The  acquisition  group  modules  nave 
proven  to  be  a  more  difficult  assignment  and  only  portions 
of  the  acquistion  group  have  been  implemented.  Tne  important 
point  of  the  FSI  design  is  that  it  provides  a  modular 
division  of  tne  program  syntnesis  problem  tnat  neips  provoke 
study  into  these  sub-problems. 

C.   OBJECTIVES 

Automatic  programmers,  which  synthesize  programs  from 
example  computations,  require  conditions  to  be  explicitly 
defined  by  the  user  in  order  to  generate  programs  with  a 
minimum  number  of  instructions.  Previous  woric  (  Biermann  and 
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Krishnaswamy  U6j  ,  and  Biermann  [18J  )  nas  reduced  the 
number  of  required  conditions,  but  nas  not  eliminated  trie 
need  for  the  user  to  explicitly  state  a  minimal  set  of 
condi  tions. 

The  explicit  definition  of  conditions  is  not  a  natural 
part  of  an  example  computation.  Tnat  is,  one  would  not 
normally  give  control  structure  information  when  usin? 
examples  to  explain  now  a  tasK  is  to  be  performed.  Our 
objective  is  to  provide  an  environment  wnere  the  user  may 
define  tne  tastes  ne  wants  accomplished  without  explicitly 
defining  the  control  structures  that  specify  tne  flow  of 
execution  in  a  synthesized  program. 

We  will  implement  an  automatic  programming  system  based 
upon  the  example  computation  specification  method  in  order 
to  study  the  feasibility  of  identifying  conditions  from  user 
actions.  We  limit  this  study  to  the  domain  of  text  editing 
in  order  to  provide  a  well  defined  area  in  wnicn  to  worst.  It 
is  hoped  that  the  results  of  our  efforts  may  provide  insight 
into  tne  overall  problem  and  generate  further  research  wnicn 
will  extend  condition  identification  to  other  domains. 

D.   THESIS  ORGANIZATION 

The  thrust  of  this  thesis  is  the  developement  of  methods 
for  the  automatic  construction  of  conditions  necessary  for 
the  proper  synthesis  of  programs  from  example  computations. 
Example  computation  is  one  approach  to  the  problem  of 
program  synthesis.  Chapter   One  introduces   tne  reader   to 
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program  synthesis  anl  gives  a  brief  historical  perspective 
of  tne  evolution  of  this  field  of  study.  Cnapter  One  also 
provides  a  comparison  of  tne  different  proposed  approacnes 
to  tnis  problem. 

An  automatic  programmer  nas  been  implemented  to  support 
tnis  researcn.  Tnis  synthesizer  was  developed  to  use  tne 
example  computation  metnod  for  program  specification. 
Cnapter  Two  is  a  detailed  explanation  of  our  particular 
implementation.  Cnapter  Two  includes  a  discussion  of 
techniques  we  nave  incorporated  in  our  implementation  wnicn 
speed  up  tne  syntnesis  process. 

Chapter  Tnree  presents  our  approach  to  venerating 
conditions  given  an  example  computation.  It  describes 
algorithms  which  will  venerate  conditions  from  a  sequence  of 
editor  instructions. 

Chapter  Four  discusses  tne  result  of  our  research.  A 
brief  discussion  is  included  on  tne  merits  of  tne 
synthesizer  wnich  we  nave  implemented  and  recommendations 
are  given  for  potential  improvement.  Finally,  Cnapter  Four 
presents  a  review  of  our  worff  on  identification  and 
construction  of  condtions  from  example  computations.  Areas 
requiring  further  research  have  been  highlighted  and 
examples  of  possible  applications  to  otner  domains  nave  been 
pointed  out. 
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II.   SYNTHESIZER 

A.   GOALS 

There  is  a  two-foil  purpose  benind  designing  and 
building  trie  program  synthesizer.  The  first  directly  relates 
to  the  usefulness  of  the  synthesizer.  It  is  hoped  that  by 
"laying  tbe  groundwork"  for  an  autoprogramming  system,  the 
impetus  will  be  provided  that  will  eventually  result  in  a 
total  automatic  programming  environment  being  available  for 
the  user.  This  environment  is  envisioned  as  an  interactive 
one  consisting  of  several  components:  an  interface  to 
provide  tne  user  with  the  means  to  perform  example 
computations*  a  linJc  between  the  interface  and  the 
synthesizer  which  records  the  user  actions  and  transmits  a 
trace  of  those  actions  to  tne  synthesizer*  the  syr.tnesizer 
itself  which  produces  the  algorithm  in  some  internal  form, 
and,  finally,  a  translator  tnat  receives  tne  internal 
representation  of  the  algorithm  and  translates  it  into 
machine-readable  form  and/or  user-readable  form.  The  second 
purpose  for  wnicn  the  synthesizer  is  built  is  to  nrovide  a 
suitable  vehicle  to  be  used  in  the  main  area  of  research 
tnat  tnis  thesis  explores.  If  an  autoprogrammer  can  generate 
correct  algorithms  from  example  computations,  how  much  can 
be  done  to  relieve  tne  user  from  naving  to  include  orancning 
or  looping  conditions  in  his  example  computations? 
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£.   OVERVIEW 

1.   Sen°ral  Description 

An  automatic  programming  system  whicft  produces 
progra-ns  based  upon  tne  user's  input  of  example  computations 
has  a  natural  appeal.  Example  computations  are  sequences  of 
instructions  performed  in  an  algorithmic  manner.  For 
instance,  if  tne  user  is  doing  a  matrix  multiply,  computing 
the  entry  for  the  resultant  matrix  involves  the  sum  of 
products  from  tne  appropriate  row  and  column  of  tne 
multiplicand  and  multiplier  matrices,  respectively.  When 
numans  communicate  ideas  to  eacn  other,  tne  proper  use  of 
example  computations  often  plays  a  vital  role.  It  is  nard  to 
inagine  trvlng  to  explain  tne  method  of  multiplying  two 
matrices  together,  or  trying  to  explain  the  concept  of 
set-subset  relationships  without  Being  able  to  draw  examples 
that  enhance  the  explanations.  This  method  of  communication 
seems  to  be  vital  to  numan  understanding  of  algorithms. 
Since  programmers  often  use  small  example  computations  while 
codinsr  programs,  it  seems  that  a  logical  approach  to 
automatic  programming  would  consist  of  the  machine  doing  tne 
actual  program  synthesis  based  upon  example  computations 
given  by  the  programmer. 

Program  synthesis  is  tne  act  of  putting  instructions 
together  in  sucn  a  way  tnat  an  algorithm  is  built  which 
accomplishes  a  desired  tasfc.  Ocviously,  an  algorithm  which 
is  an  exact  replication  of  tne  sequence  of  instructions  will 
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accomplish  tne  tasic,  tut  it  is  uninteresting  since  it  cannot 
be  generalized  to  accomplish  a  set  of  related  tasfcs.  For 
example,  a  linear  sequence  of  instructions  wnicft  multiplies 
two  2x2  matrices  together  will  only  work  for  2x2 
matrices;  nowever,  by  allowing  loop  constructs  and  if-tnen 
constructs,  an  algorithm  can  be  produced  wnich  performs  trie 
more  general  tasK  of  multiplying'  any  two  matrices  witn  legal 
row  and  column  dimensions.  So,  in  tne  case  of  tne  matrix 
multiply,  the  task  of  tne  program  synthesizer  is  to  produce 
a  general  matrix  multiply  algorithm  given  tne  example 
computation  for  a  2  x  2  matrix  multiplication  in  some  form 
such  as: 

c[l,lj  =  a[l,lj  *  b[l,lj  +  a[i,2j  *  b[2,lj 
c[l,2]  =  all,l]  *  bll,2]  +  a[l,2J  *  b|.2,2] 
c[2,lj  =  a[2,lj  *  b[l,lj  *  a[2,2J  *  d[2,1J 
c[2,2]    =    a[2,l]    *    b[l,2j    +   a[2,2j    *   bl2,2j 

Generalizing  from  tne  example  computation  also 
requires  some  means  of  noting  when  tne  array  bounds  have 
been  reacned  for  this  example.  In  otner  words,  conditions 
have  to  be  interposed  between  some  instructions  wnere  a 
change  in  the  flow  of  control  for  tne  algorithm  is 
necessary.  An  input  trace  is  defined  as  a  sequence  of 
instructions  and  conditions  wnicn  describes  the  example 
computation.  In  the  matrix  multiply  example  this  might  be 
accomplished   tnusly: 
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C[1,1J    =   u 
C[lflJ    =    C[1,1J     +    All.lJ    *    Bll.lJ 

C[1,1J     =   C  [1,1J     +   A[l,2j    *    B[2,1J 

COND  -   col   index   of  A  =   col   size   of  A 

C[l,2]    =   0 
C[l,2j     =   C  [1,2J     +   A[l,lJ    *    B[lt2j 
C11.2J    =    C[l,2j    ♦    A[l,2J    »    B[2t2J 

COND  -  col   index   of  A  =   col    size   of   A 


C[2,2J    =   C[2,2J    +    A[2,2J    *    B[2,2j 
COND   -   row   &   col    index  of    C   =   Dimension   of   C 

STOP 
The   program  synthesizer     used     for     this      thesis     is 

designed  around  concepts  and  ideas  on  synthesizing  a  program 
eiven  example  traces  as  described  in  reference  [17] . 
Previous  research,  references  [16J  t  [17J  ,  and  [18j  ,  seems  to 
indicate  that  correct  prog-rams  can  oe  synthesized  on  the 
basis  of  relatively  few  sample  computations,  out  that  tne 
amount  of  time  required  to  do  the  synthesis  grows  very 
quietly  as  a  function  of  program  complexity. 
2.      Trace   Coding 

Tne  syntnesis  procedure  is  domain  independent;  that 
is,  the  input  trace  can  be  coded  into  any  consistent 
representation,  and  it  will  not  affect  the  operation  of  the 
synthesizer.  Since  the  synthesis  procedure  is  independent  of 
the  input  trace  representation,  alphanumeric  characters  will 
be  used  to  represent  instructions  and  conditions.  They  are 
distinguished     from     each   other  by    their  position  within    the 
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trace  ratner  than  by   their   symbolic   representation.   Fcr 

example,    an   'a'  mient   represent   an   instruction   or   a 

condition.  Within  the   instruction   set   itself,   identical 

instructions   are  encoded   as   identical   symbols.  A.  simple 

trace  of  a  routine  to  find  all  positive  numbers  in  an   input 
stream  mi  etit  be  : 


A  =  0 
READ  E 


COND  -  B  is  negative 

A  =  A  +  1 
READ  B 

COND  -  B  is  negative 

A  =  A  +  1 

READ  B 

COND  -  B  is  positive 
PRINT  B 


If  the  instruction  A=A+1  is  represented  by  a  '&',  eacn 
occurrence  of  that  instruction  in  the  trace  will  nave  to  be 
represented  by  a  'b'.  The  reason  for  tnis  constraint  is 
obvious.  Since  the  synthesizer  only  receives  a  trace  of  the 
example  execution,  it  cannot  determine  wnetner  A=A+l  is  tne 
same  instruction  bein?  encountered  repeatedly  in  a  loop,  as 
it  is  in  this  example,  or  whether  there  are  several 
independent  occurrences  of  A=A+1.  Figure  4  is  an  example  of 
a  typical  coded  input  trace.  Tne  left-hand  column  entries 
are   conditions   and   tne   rignt-nand   column   entries    are 
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instructions.  Figure  4  is  read  as  state  s  transistions  on 
condition  'x'  to  state  'a'  wnicn  in  turn  transitions  on  'x' 
to   state    'b',   and   so  fortn. 


transitions  states 

s 

x  a 

x  b 

x  c 

x  b 

y  e 

X  c 

x  to 

x  c 

y  s 

y  a 

x  b 

y  a 

x  to 

x  f 

i  d 

x  b 

x  f 

x  d 

y  s 

Figure  4.   Input  Trace 


3.   Input/Output  Trace  Representation 

A  Moore-type  representation,  as  defined  in  [17J ,  can 
toe  used  to  highlight  certain  features  that  must  be  dealt 
witn  wnen  producing  an  algoritnm  from  an  example  trace. 
Throughout  the  rest  of  the  discussion,  Moore  machines  and 
algorithms  will  be  used  synonymously.  Conditions  relate  to 
transitions  and  instructions  relate  to  states  of  the 
machine.  In  fact,  tne  function  of  the  synthesizer  can  be 
viewed  as  that  of  determining  a  minimum-state  deterministic 
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Moore  macnine  equivalent  of  a  non-deterministic  Moore 
machine.  Representing'  input  traces  as  Moore  machines  will 
often  snow  tne  non-deterministic  structure  of  the  example 
trace.  This  non-determinism  must  be  resolved  by  the 
synthesizer  in  order  for  an  algorithm  to  be  venerated. 
Figure  5  is  the  Moore  machine  representation  of  the  in-out 
trace  of  Figure  4.  Notice  that  at  node  'b',  the  trace  is 
non-deterministic.  Transition  'y '  leads  from  node  'b'  to  two 
different  nodes;  similarly,  transition  'x'  leads  from  noie 
'b'  to  two  separate  nodes.  Figure   6   is   the   deterministic 

Moore  machine  which  has  been  constructed  by  our  synthesizer 
based  upon  tne  input  trace  given  in  Figure  4.  The 
non-determinism  has  been  resolved  by  splitting  state  'a' 
into  two  states  distinguished  from  each  other  oy  an  integer 
prefix  label.  The  assignment  of  the  prefix  label  is  the 
mechanism  used  by  tne  synthesizer  to  prevent 
non-determinism.  In  order  to  accomplish  this  assignment,  the 
synthesizer  uses  an  enumeration  tecnnique.  Eacn  instruction 
is  assigned  a  prefix  label  in  a  manner  that  maintains 
determinism  and  assures  that  the  algorithm  will  correctly 
execute  the  input  trace.  It  is  easy  to  verify  that  tne 
deterministic  Moore  machine  of  Fieure  6  will  execute  the 
trace. 
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Figure  5.      Non-aeterministic  Moore   Macnine 
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Fieure   5.      Deterministic   Moore   Macnine 
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C.   SYNTHESIS  PROCEDURE 
1.   Function 

The  function  of  tne  synthesizer  program  is  to 
provide  a  minimum-s  tate ,  correct  program  consistent  with  trie 
input  trace  of  tne  exanpie  computation.  Tne  syntnesis 
process  will  rje  completed  wnen  it  is  determined  which 
occurrence  of  a  labelled  instruction  corresponds  to  eacn 
particular  instruction  in  the  input  trace.  In  order  to 
accomplish  tnis  goal,  tne  syntnesizer  is  basically 
structured  as  a  deptn-first  searcn  algorithm.  Backup  and 
fixup  mechanisms  exist  to  enhance  the  searcn  procedure  wnen 
pruning  nas  not  Kept  tne  algorithm  from  traversing  a 
fruitless  brancn  of  the  searcn  tree.  Tne  spar~h  mechanism 
attempts  to  assign  a  label  to  eacn  instruction  in  sucn  a 
manner  tnat  tne  generated  algorithm  remains  technically 
correct;  that  is,  nondeteminism  is  not  allowed  to  exist  and 
the  original  trace  can  still  be  executed.  A  number  of 
techniques  exist  within  the  synthesizer  which  aid  pruning  of 
tne  searcn  tree,  and  tnereby  mase  it  possible  to  «yntnesize 
more  complicated  programs  in  a  reasonable  amount  of  time 
tnan  could  otherwise  be  expected  from  a  general  enumeration 
technique.  These  techniques  offset  the  major  disadvantage  of 
exponential  erowtn  of  tne  searcn  space  as  a  function  of 
input  which  is  found  in  a  general  enumerative  search 
technique. 
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'^.   Concepts 

Certain  definitions  and  concepts  must  De  presented 
before  the  actual  algorithm  is  discussed.  Ln  order  to 
facilitate  trie  discussion,  it  is  necessary  to  refer  to 
Fisrure  7.  Each  level  in  the  fie-ure  consists  of  an 
Inst  ruction-condition- instruction  triple  .  referred  to  as  an 
I-C-I.  In  Figure  7  tre  leftmost  symhol  under  I-C-i  is 
referred  to  as  trie  leading  instruction,  of  tne  triple,  trie 
middle  symbol  is  the  condition,  and  tne  rightmost  symbol  is 
tne  trailing-  instruction.  Tne  trailing  instruction  at  level 
i  becomes  tne  leading  instruction  at  level  i+l.  So  this 
input  trace  represents  tne  instruction-condition  sequence  's 
r  a  n  s  r  a 


levej. 
1 
2 

3 

5 
5 
7 
B 


I-C-I 
sra 
ans 
sra 
ada 
axa 
ay  a 
axa 
anr 


Figure  7.   Instruction-Condition-Instruction  Triple 

Two   levels   i   and  j  are  said  to  belong  to  the  same 
couple-class  if  the  elements  of   the   level   are   the   same. 
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Instruction  elements  of  tne  trace  wnicn  are  in  tne  same 
couple-class  may  De  assigned  tne  same  prefix  laoel  during- 
syntnesis  if  tne  assignment  aoes  not  cause  non-determinism. 
For  example,  given  tne  trace  in  Figure  7,  levels  1  and  6  are 
in   tne  same  couple-class,  as  are  levels  5  and  7.  Difference 

set  relations  are  anotner  situation  tnat  can  exist  wnicn  is 
of  interest.  Tne  first  two  elements  of  level  i  and  level  j 
are  tne  same,  but  tne  tnird  element  is  not  tne  same.  A 
difference  set  relation  indicates  tnat  tne  leading 
instructions  cannot  be  represented  by  tne  same  state 
regardless  of  tne  prefix  laoel  assigned  during  syntnesis 
because  tne  leading  instruction  nas  tne  same  transition  to 
two  different  trailing  instructions.  Again  using  tne  above 
trace,  level  2  and  level  8  fail  into  tnis  category.  In  tnis 
situation,  tbe  iniex  H  would  be  entered  into  tne  difference 
set  for  level  2.  By  implication,  tne  index  2  is  also  in  tne 
difference  set  for  level  8,  altnough,  in  practice,  it  is  not 
entered. 

Once  the  initial  couple-class  information  and 
difference  set  information  nave  been  determined,  additional 
difference  set  information  can  be  obtained  tnrougn  tne 
chaining  nature  of  differencing.  For  example,  suppose  tne 
trace  consists  of  tne  one  snown  in  Figure  8.  Tnen  tne  Moore 
machine  representation  of  tnis  trace  is  snown  in  Figure  9. 
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inlex    trace 


5 

5 


axa 
axa 
ays 


u 

9 

10 


axa 
axa 
ayt 


Fieure  B.  Cnainins  of  Difference  Set  Relations 


Figure  9.  Non-deterministic  Input  Trace 

Tnis  macnine  is  obviously  nondeterministic  since 
state  'a'  transitions  by  'y'  to  two  different  states. 
Difference  set  resolution  requires  tnat  tne  index  for  'ayt' 
be  in  tne  difference  set  of  'ays'.  Since  tnat  requirement 
causes  different  states  to  represent  tne  'a'  in  'ayt'  and  in 
'ays',  and  furtner  since  tne  trailing  'a'  in  tne  preceding 
level  is  exactly  tne  same  instruction,  tne  preceding  levels 
now  satisfy  tne  difference  set   relation.   The   leading 
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instruction  and  tne  condition  are  tne  bame,  out  tne  trailing 
instruction  in  tne  I-C-I  triple  is  different  since  tney  nave 
previously  been  assigned  to  a  difference  set  relation. 
Tnerefore,  tne  lead  instruction  must  be  labelled  witn  a 
different  prefix  during  assignment  and  similarly,  tne  levels 
above  tnem.  So  tne  Moore  macnine  will  now  be  deterministic 
and  in  tne  following  form. 


Figure  125.   Deterministic  Trace 

Given  a  partial  trace  derived  from  tne  example 
execution,  tnere  are  numerous  Moore  macnines  tnat  can  be 
constructed  to  satisfy  tne  trace.  At  one  end  of  tne 
spectrum,  a  program  can  be  constructed  sucn  tnat  eacn 
succeeding  state  is  assigned  a  different  prefix  label.  Tnis 
method  always  results  in  a  st raient-line  program.  Eacn 
instruction  nas  one  transition  entering  it  and  one 
transition  exiting  from  it.  Allowing  tnis  metnod  produces 
tne  maximum  size  program  consistent  witn  tne  input  trace. 
See  Figure  11.  Tnis  is  not  a  particularly  desirable  metnod 
since  it  does  not  recognize  loop  structures  that  can 
significantly   reduce  tne  size  of  tne  program.  Additionally , 
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it  hides  trie  basic  structure  or  trie  algorithm.  Trie  major 
advantage,  of  course,  is  that  absolutely  no  searcn  is 
required    to    produce    a    deterministic   machine. 


condition        instruction 


a 

a 
a 


x  a 

Figure  11a.  Trace 


Figure  lib. Program 


Figure  11.   S traignt-line  program 

On  tne  otner  end  of  tne  spectrum,  a  program  can  be 
constructed  sucn  tnat  eacn  identical  instruction  receives 
the  same  prefix  label.  This  method  tatces  full  advantage  of 
loop  structures,  and  will  resv.lt  in  a  minimum  state  machine. 
Eowever,  such  a  method  will  seldom  produce  a  deterministic 
machine;  therefore,  it  will  not  produce  a  satisfactory 
algorithm.  See  Figure  12. 


level   cond   lnstr 


1 
2 

3 

5 

6 


x 

X 
X 

y 

Y 


Figure  12a.  Trace 


Figure  12b.  Program 


Figure  12.   Minimum  State  Machine 
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Trie  Best  solution  lies  somewnere  between  these 
endpoints.  A.  reasonable  first  suess  at  me  number  of  states 
required  to  produce  a  leterminis tic  macnine  within  mis 
spectrum  can  be  made  by  es tablisnin?  a  lower  bound  on  tne 
number  of  states.  Tne  cardinality  of  tne  instruction  set  is 
defined  as  tne  number  of  different  instructions  appearing  in 
tne  trace.  Using  tne  above  figure  as  an  example,  it  can  be 
determined  mat  tne  cardinality  of  tne  instruction  set  is 
two?  tnat  is,  mere  are  two  different  instructions,  'a'  and 
'b't  in  tne  trace.  Tnis  measure  provides  an  absolute  lower 
bound  on  tne  number  of  states  required  in  tne  final  macnine. 
Tnis  lower  bound  can  be  refined  by  determining  a  lower  bound 
on  the  number  of  states  needed  for  eacn  individual 
instruction.  Once  again,  using  tne  above  figure  as  an 
example  illustrates  tnis  concept.  Tne  instruction  'a'  at 
level  5  must  be  different  than  tne  instructions  at  levels  1 
tnrougn  4  because  of  difference  set  resolution,  or  else 
nondetermini sm  results  on  tne  transition  'y'.  Therefore,  in 
order  to  maintain  determinism,  tne  instruction  'a'  must  be 
allowed  at  least  two  states.  Summation  of  tne  lower  bounds 
for  eacn  of  tne  instructions  gives  a  lower  bound  on  tne 
total  number  of  states  required  for  tne  macnine.  For  tnis 
particular  example,  tne  program  would  be  eenerated  as: 


50 


Figure  13.  Instruction  Set  Lower  Bounds 

If  tne  searcn  space  is  viewed  as  a  tree  structure 
then  tne  levels  of  tne  tree  can  be  associated  witn  tne 
instructions  by  assigning  tne  first  instruction  in  tne  input 
trace  to  tne  first  level,  tne  second  instruction  to  tne 
second  level,  ana  so  form.  Tne  brancning  factor  at  eacn 
level  is  tne  state  lower  bound  computed  for  tne  instruction 
seen  at  that  level.  The  prefix  label  assigned  to  tne 
instruction  is  represented  by  tne  specific  brancn  used  to 
traverse  to  the  next  level. 

Tne  idea  of  providing  a  lower  bound  on  tne  number  of 
states  leads  to  an  iteratively  expanding  deom-first  search. 
Wnen  all  possible  combinations  of  prefix  labels  nave  been 
tried,  but  tne  algorithm  remains  non-deterministic,  tne 
lower  bound  is  incremented  and  tne  searcn  is  restarted  from 
the  top  level.  When  the  lower  bound  is  increased,  the  search 
tree  obtains  additional  paths  to  tne  final  solution  by 
increasing  tne  branching  factor  associated  witn  one  or  more 
instructions.  The  depth  of  a  successful  searcn  into  tne  tree 
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is  restricted  by  tne  lower  bound  on  tne  number  of  nodes 
required  by  tne  deterministic  macnine.  Only  mnen  a  pattern 
of  prefix  assignments  nas  been  mace  wnicti  allows  tr.e 
algoritnm  to  remain  deterministic  and  ail  of  tne 
instructions  in  tne  original  trace  nave  been  assigned  prefix 
labels  will  tne  syntnesis  terminate.  Tnis  mecnanism  prevents 
a  straignt-line  model  from  being  output  as  tne  algorithm 
unless  it  is  tne  only  one  tnat  can  satisfy  tne  input  trace. 
More  importantly*  it  provides  tne  minimum-state 
deterministic  macnine  capable  of  executing  tne  input  trace. 

D.   SYNTHESIZER  STRUCTURE 

Tne  syntnesis  program  is  subdivided  into  two  primary 
modules:  static  processing  of  tne  input  trace;  and  dynamic 
processing  of  tne  information  extracted  from  tne  input  trace 
by  tne  preprocessing,  or  static  processing  pnase.  Static 
processing  provides  information  sucn  as  couple-classes, 
difference  sets,  and  lower  bounds  on  tne  number  of  macnine 
states.  Dynamic  processing  uses  Knowledge  inherited  from 
preprocessing  to  guide  tne  search  mecnanism  to  a  final 
output  of  tne  algorithm.  Tnese  two  modules  will  be  discussed 
in  turn,  and  tne  primary  mecnanisms  involved  will  be 
amplified. 

1 .   Static  Processing 

Static  processing  can  be  conceptualized  as 
consisting  of  tnree  main  functions:  (a)  accept  tne  input 
trace?    (b)   preprocess   tne   trace   for  difference   sets. 
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couple-classes,  and  state  bounds;  and  (c)  prepare  a  trace 
table  for  further  use  by  dynamic  processing.  Once  tnis 
preprocessing  nas  been  accomplished,  the  static  module  is  no 
longer  necessary  to  tne  syntnesizer. 

In  tne  current  configuration,  tne  static  module 
expects  to  find  tne  input  as  a  sequence  of 
instruction-condition-instruction  triples.  Figure  14  is 
example  of  an  input  trace. 
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Figure  14.  Typical  Input  to  Static  Processor 
Eacn  line  consists  of  a  triple,  for  example  'anp'. 
The  'a'  represents  an  instruction,  tne  'n'  represents  the 
condition  wnicn  causes  the  program  trace  to  transition  to 
the  next  instruction  'p'.  For  each  level,  tne  first  element 
represents  tne  same  instruction  as  tne  last  element  of  the 
preceding  level.  This  is  easier  to  see  if  tne  aoove  trace  is 
represented  as  a  Moore  machine  in  wnich  the  nodes  are 
instructions  and  tne  conditions  are  transitions.  State  'a' 
transitions  on  condition  'n'  to  state  'p'  which  transitions 
on  condition  's'  to  state  'a'  wnich  transitions  on  condition 
'g'  bacfc  to  state  'a',  etc. 
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Figure  15.   Moore  Machine  for  Input  Trace 
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Figure  lb.  Intermediate  Trace  Table 
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EacH  occurrence  of  an  instruction  symbol  in  the  input  trace 
is  represented  by  tne  same  state  at  tnis  point  in  tne 
synthesis. 

Once  tne  input  trace  nas  been  accepted,  static 
processing  can  begin.  Static  processing  consists  of 
determining  tne  level  indices  associated  witn  eacn 
couple-class  and  witn  eacn  difference  set.  For  tne  trace  of 
Figure  15,  tnese  are  shown  in  Figure  16. 

Tnere  are  two  couple-classes  in  tnis  trace.  Tney  are 
[agaj  at  levels  3  and  8,  and  [rsrj  at  levels  5  and  6.  The 
remaining  levels  are  not  assigned  to  a  couple-class  because 
no  other  levels  match  with  tnem.  Couple-class  information  is 
useful  to  the  dynamic  processor  for  determining  forced 
assignments  and  dynamic  non-equivalence.  These  ideas  will  be 
discussed  more  fully  in  tne  section  on  dynamic  processing. 

Difference  sets  exist  for  levels  3  and  4.  Level  4 
has  a  difference  set  wnicn  contains  the  index  y;  that  is, 
tne  element  at  level  4,  'ayt',  must  nave  a  different  prefix 
label  on  'a'  tnan  tne  element  at  level  9»  'ayt'.  If  the  'a' 
is  not  labelled  differently  during  tne  syntnesis, 
nondetermini sm  will  result  since  the  same  transition  would 
lead  to  different  nodes. 

Difference  set  resolution  is  a  very  powerful 
mechanism  for  ensuring  deterministic  behavior  of  the 
algoritnm.  A  considerable  amount  of  tne  prefix  label 
assignments   to   the   nodes  can  be  resolved  using  difference 
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sets.  Notice  tnat  level  8  appears  in  tne  difference  set  for 
level  3  even  though  levels  3  and  9  are  in  tne  sa~e 
couple-class.  M  first  tnis  appears  contradictory  since 
equivalent  couple-class  names  imply  tnat  tne  elements  are 
tne  same,  but  difference  set  existence  forces  tne  lead 
instructions  to  be  different.  Tnis  points  out  tne  relative 
power  of  couple-class  information  and  difference  set 
information.  Difference  set  information  is  immutable. 
Couple-class  information  only  nints  at  equivalence.  In  tnis 
particular  example,  tne  entry  at  level  3  was  caused  by  tne 
chaining  effect  of  difference  set  resolution.  Notice  tnat 
since  tne  'a'  at  level  4  must  be  different  tnan  tne  'a'  at 
level  9f  and  notice  tnat  since  tne  trailing  'a'  at  level  3 
is,  by  definition,  tne  same  as  tne  leading  'a'  at  level  4, 
tne  trailing  'a'  at  level  3  cannot  be  tne  sane  as  tne 
trailing  'a'  at  level  e;  tnerefore,  levels  3  and  8  cannot  be 
in  tne  same  couple-class. 

To  compute  tne  lower  bound  on  tne  number  of  states 
in  tne  aigoritnm,  tne  minimum  number  of  states  needed  for 
eacn  instruction  is  summed.  For  tnis  same  example,  tne 
instruction  set  consists  of  {a,p,r,t>.  Tne  bounds  for  p,r, 
and  t  are  eacn  1.  Tne  bound  for  'a'  is  2.  Tnere  must  be  at 
least  two  different  occurrences  of  'a'  from  tne  difference 
set  resolution.  Tnerefore,  tne  minimum  number  of  states  with 
which  a  deterministic  Moore  machine  can  be  constructed  for 
this  trace  is  5. 


56 


Finally,  static  processing  passes  all  tne 
information  concerning  tne  input  trace  to  tne  dynamic 
processor  via  a  trace  table  in  tne  following  form.  Eacn 
level  nas  only  one  associated  condition  and  one  associated 
instruction.  Since  difference  set  information  is  associated 
w  i  t  n  tne  lead  instruction  in  an 
instruction-condition-instruction  sequence,  it  is  entered  at 
tnat  level.  Since  couple-class  information  is  associated 
witn  tne  entire  instruction-condition-instruction  sequence, 
it  is  associated  witn  tne  trailing  condition-instruction 
pair. 
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Figure    17.      TraceTaole 

2.   Dynamic  Processing; 

Dynamic  processing  involves  assigning  prefix  labels 
to  tne  states  of  tne  macnine.  In  tnis  way,  separate 
occurrences  of  tne  same  instruction  are  differentiated.  Tne 
dynamic  processor  is  tne  searcn  mecnanism  for  tne 
syntnesizer.  It  operates  in  sucn  a  way  tnat,  at  any  point  in 
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the  synthesis,  trie  portion  of  tne  trace  previously  processed 
represents  a  deterministic  Moore  macnme.  In  order  to 
maintain  tne  determinism,  dynamic  processing  steps  tnrou?h 
tnree  pnases:(l)  assignment  of  tne  prefix  label  to  tne 
instruction;  (2)  difference  set  resolution,  and  (3)   dynamic 

equivalence  assurance.  Additionally,  eacn  of  these  pnases 
nave  built  in  fixup  and  backup  conditions  associated  witn 
them.  Tne  f ixup/bacicup  conditions  encountered  during 
difference  set  resolution  or  during  dynamic  equivalence 
checking  are  indicators  tnat,  if  tne  current  assignments 
remain  tne  same,  a  nondeterminism  will  occur  in  future 
assignments.  As  sucn,  tney  inform  tne  pruning  mecnanisms  of 
the  seared  algorithm. 

An  integral  part  of  tne  dynamic  processor  is  tne 
failure  memory.  It  controls  tne  searcn.  Tne  failure  memory 
may  be  conceptualized  as  a  L  x  M  matrix  wnere  L  is  tne  row 
size  and  corresponds  to  tne  number  of  levels  in  tne  trace. 
Eacn  row  nas  M  columns  wnere  M  is  equal  to  tne  lower  bound 
assigned  to  tne  instruction  contained  on  tnat  level  of  tne 
trace.  An  entry  into  tne  failure  memory  at  some  level  i  and 
some  column  J,  where  1  <=  i  <=  L  and  1  <=  J  <=  M,  prevents 
trie  assignment  of  j  as  a  prefix  label  for  tne  instruction  at 
level  i.  When  a  failure  memory  cell  contains  an  entry  it  is 
called  a  valid  ceil;  otnerwise  it  is  invalid.  Bach  ceil  of 
tne  failure  memory  is  a  two-element  entry.  Tne  structure 
factor  is  the  first  element.  It  indicates  wnich  level  of  the 
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trace  caused  tne  entry.  Tne  free  state  factor  is  tne  second 
element.  As  tne  name  indicates,  tnis  element  is  a  function 
of  tne  number  of  free  states  available  at  tne  time  of 
assignment.  Tne  specifics  of  tne  failure  memory  operation 
and  tne  nature  of  failure  memory  entries  will  be  discussed 
throughout  tne  rest  of  the  section  as  each  phase  of  the 
dynamic  processor  is  discussed, 
a.   Label  Assignment 

As  previously  mentioned,  label  assignment  is  tne 
first  function  provided  by  the  dynamic  processor.  A  label 
assignment  can  be  either  forced  or  arbitrary.  Additionally, 
the  assignment  can  result  in  the  creation  of  a  new  state,  a 
label-name  combination  not  seen  before.  A  forced  assignment 
occurs  when  the  instruction  at  the  current  worging  level  is 
a  member  of  the  same  couple-class  as  an  instruction  at  a 
prior  level,  and  the  lead  instruction  into  botn  of  those 
levels  has  tne  same  label  assignment.  Tne  current  wonting 
level  is  defined  as  tne  level  of  tne  trace  wnicn  contains 
the  most  recently  assigned  prefix  label,  but  difference  set 
resolution  and  dynamic  equivalence  checking  nave  not  been 
completed  at  that  level.  An  example  is  given  in  tne  trace 
shown  in  Figure  18. 

Tne  label  at  level  7  is  forced  by  tne  label 
assignments  at  levels  4  and  5.  Notice  that  the  instructions 
at  level  5  and  at  level  7  are  in  tne  same  couple-class, 
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Figure  IB.   Partial  Trace  Labelling 

and  that  tne  instructions  at  levels  4  and  6  nave  tne  same 
prefix  label.  Tnis  condition  forces  the  instruction  at  level 
7  to  nave  tne  same  prefix  lacei  as  tne  instruction  at  level 
5.  The  Moore  machine  representation  of  the  partial  trace  is 
snown  in  Figure  19.  Tne  assignment  at  level  8  is  also  forced 
for  similar  reasons.  By  definition,  any  forced  assignment 
involves  previously  assigned  states,  label-instruction 
combinations,  tnat  nave  been  seen  before;  therefore,  no 
forced  assignment  can  result  in  a  new  state. 


Figure  19.   Partially  Determined  Moore  Machine 
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Trie  failure  memory  can  oe  used  in  conjunction 
with  forcel  assignments  to  signal  a  Dactcup  condition  to  the 
search.  If  trie  failure  memory  entry  corresponding  to  tne 
label  assignment  at  tne  current  wortin?  level  is  valid,  then 
a  contradiction  results  from  the  forced  assignment.  Suppose 
that  the  trace  table  and  failure  memory  are  as  snown  in 
Figure  20,  and  the  forced  assignment  at  level  6  has  just 
been  made.  Tne  entry  '1.1*  at  row  2,  column  8  of  tne  failure 
memory  is  interpreted  in  the  following  manner.  The  integer 
to  the  left  of  tne  decimal  indicates  that  the  entry  was 
caused  by  the  current  assignment  at  level  l.  The  *1*  to  the 
right  of  the  decimal  point  is  the  number  of  free  states  +  1 
available  when  tne  assignment  at  level  1  caused  tne  failure 
memory  entry;  therefore,  wnen  tne  entry  was  made  there  were 
no  free  states  available,  ft.  free  state  is  one  wnicn  nas  not 
been  bound  to  a  particular  instruction. 

Tne  assignment  at  level  3  is  forced.  In  other 
words  the  sequence  of  the  previous  assignments  causes  the 
prefix  label  of  the  instruction  at  level  8  to  be  a  2. 
However,  the  failure  memory  contains  an  entry  at  row  8 
column  2,  Ftf(3,2) .  This  entry  indicates  tnat  tne  instruction 
at  level  8  cannot  be  assigned  the  label  '2',  for  if  it  were 
to  be  assigned  a  '2',  a  nondeterminism  will  result.  To 
resolve  tne  conflict,  backup  is  initiated  until  tne  last 
unforced  assignment  is  found.  In  this  case,  the  bacfcup  is  to 
level  6. 
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The   assignment   at   level   6  will  be  changed  and  tne  search 
will  continue  from  there. 
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Figure  20.   Trace  Table/Failure  Memory  Configuration 
for  a  Forced  Assignment 

If  tne  assignment  is   not   forced,   tne   failure 

memory   row   corresponding   to   the  current  wording  level  is 

searched  for  tne  first  occurrence  of   an   Invalid   cell.   An 

invalid   cell  is  one  which  does  not  contain  a  failure  memory 

entry.  If  a  cell  is  invalid,   tne   assignment   of   a   prefix 

label   corresponding   to  the  failure  memory  column  index  for 

that  cell  is  possible  on  that  level  of  tne  trace.  The  column 

number  of   tne   first   invalid   cell   becomes   tne   label 

assignment   for   the  instruction  at  that  level.  For  example, 

suppose  level  5  is  the  current  wording  level  and   the   trace 

table  and   failure  memory  nave  the  configuration  snown  in 

Figure  21. 
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Figure  21.   Trace  Table  Entry  Snowing 

Arbitrary  Assignment  Metnod 

Tne  first  invalid  entry  in  tne  failure  memory  on 
row  6  is  in  column  6i  therefore,  instruction  'a'  for  level  b 
will  be  assigned  a  prefix  label  of  3.  Tnese  non-forced 
assignments  may  result  in  the  creation  of  a  new  state;  that 
is,  a  label-instruction  pair  not  previously  assigned  during 
tne  synthesis.  If,  at  some  future  point  in  tne  searcn,  a 
backup  is  initiated  that  reaches  this  level  of  tne  trace, 
tne  bacfcup  -necnanism  will  not  stop  to  perform  a  retry.  At 
any  point  in  the  synthesis,  all  previous  levels  have 
received  assignments  based  on  the  constraint  that  tne 
minimum  number  of  states  nas  been  used  consistent  with 
maintaining  determinism;  tnerefore,  assigning  a  different 
prefix  label  to  a  state  wnich  has  been  defined  as  a  new 
state  only  changes  tne  name  of  the  state,  and  does  not 
change  the  structure  of  tne  algorithm.  Since  tne  structure 
of  tne  algorithm  has  not  been  changed,  the  cause  of  the 
nondetermini sm  is  still  present. 

One  other  type  of  assignment  should  be  mentioned 
at  tnis  point.  Pseudo-assignment  occurs  wnen  tnere   is   only 
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one  invalid  ceil  left  in  a  failure  memory  row  at  a  level 
otner  tnan  tne  current  wording  level  and  tnere  are  no  free 
states  available.  Although  pseudo-assignment  does  not 
immediately  cause  a  label  to  oe  assigned  to  tne  instruction 
at  that  level,  it  does  simulate  a  loox-ahead  mecnanism  for 
tne  searcn  tecanique  by  triggering  difference  set  resolution 
and  dynamic  eauivalence  checking  as  if  that  level  of  tne 
trace  were  assigned  a  value.  Since  the  pseudo  value  is  tne 
only  value  currently  possible  for  tnat  level,  if  a  backup  or 
fixup  condition  is  encountered  during  pseudo  assignment,  tne 
assignment  mecnanism  can  immediately  try  another  label  at 
the  current  wording  level*  thereby  savin*  the  unnecessary 
search  of  a  path  which  it  already  Knows  to  be  nonproductive. 

Once  a  tentative  label  assignment  nas  been  maae 
to  the  instruction  at  the  current  wording  level,  difference 
set  resolution  and  dynamic  equivalence  cnecxing  can  be 
performed.  Althougn  these  actions  may  cause  a  fixup  on  tie 
prefix  label  at  tne  current  wording  level,  tneir  primary 
purpose  is  to  furnish  information  to  the  failure  memory  that 
will  nelp  guide  future  label  assignments. 
b.   Difference  Set  Resolution 

Difference  set  resolution  prevents  future 
assignments  being  made  that  are  Known  to  cause 
nondetermini sm  if  tne  current  assignments  remain  unchanged. 
Difference  sets  outline  a  significant  portion  of  tbe 
structure   of   tbe   input   trace   without   regard   to   label 
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assignments   in  tnat   tney   prevent    nondeterminism    from 

occurring   as  a  result  of  tne  same  transition  out  of  a  state 

leading  to  more  than  one  following   state.   Consider   Figure 
22. 


Figure  22.   Nonietermini stic  Input  Trace 


Tnere  are  several  instances  wnere  difference  set 
resolution  will  force  a  state  to  be  split  into  two  or  mere 
different  states.  States  'a',  'g',  'p't  and  't'  all  nave 
nondetermini stic  transitions  associated  with  them.  The  trace 
table  and  failure  memory  configuration  for  this  trace  is 
shown  in  Figure  23. 
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Figure  23.   Trace  Table/Failure  Memory  Configuration 
After  Assignment  at  the  Fourtn  Level 

As  dynamic  processing  proceeds   witn   label 

assignments,   difference  set   resolution  occurs.  Difference 

sets  are  resolved  oy  malting  an  entry  into  tne  failure  memory 

row  at   the   level  corresponding  to   tne  difference   set 

element,   and   tne   column  corresponding  to  the  prefix  label 

assigned  to  tne  instruction  at   tne  level  from  wnicn  ttie 

difference  set  is  bein?  resolved  if  the  cell  has  not  already 

been  made  valid  through  a  previous  assignment.  For  example, 

if  tne  prefix  assignment  at  level  1  is  a   'l',   tne  failure 

memory  entries   are   made   in  column  1  at  levels  3,5,15,18. 
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Similarly,  wnen  tne  assignment  '1'  is  made  at  level  2, 
failure  entries  are  made  at  levels  4  and  11.  Now  wren  the 
assignment  at  level  3  is  made,  ttie  dynamic  processor  will 
not  try  to  assign  a  prefix  value  of  'l'  since  tne  failure 
memory  cell  at  (3,1)  is  valid.  Tne  assignment  will 
automatically  be  '2' .  Notice  tnat  at  level  5  tne  previous 
assignments  nave  caused  tne  prefix  label  to  be  a  '3'.  In 
otner  words,  tne  failure  memory  nas  caused  tne  searcn  tree 
to  be  pruned  so  that  an  assignment  of  'l'  or  '2'  will  not  be 
tried.  Eitner  one  of  tnese  assignments  would  nave  resulted 
in  nondeterminism  being  introduced  into  tne  trace  at  level 
6. 


Figure  24a.  Prefix  Label  Equals  1 
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Figure  24b.   Prefix  Label  Equals  2 

Figure  24.   Nondeterministic  Prefix  label  Assignments 

Wnile  failure  memory  entries  are  being  made 
unler  difference  set  resolution,  it  is  possible  for  a  row  to 
nave  all  cells  valid  except  one.  Tnis  nas  been  previously 
defined  as  a  situation  leading?  to  pseudo-assignment.  Tnis 
situation  nas  occurred  at  level  11  in  tne  example  given  in 
Figure  23.  tfhen  sucn  an  occurrence  happens  a  loo£-anead 
mechanism  is  triggered  to  resolve  the  difference  set  at  tnat 
level.  In  tnis  example,  tne  failure  memory  cell  at  (21,3) 
has  been  validated  with  an  entry  which  indicates  tne  current 
wonting  level  as  level  4  wnen  tne  pseudo-assignment  occurred 
at  level  11.  Another  situation  which  can  occur  in  a  failure 
memory   row  is  wnen  all  the  entries  in  the  row  become  valid. 
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This  condition  is  called  an  incipient  fence.  Vhen  an 
incipient  fence  exists  and  tnere  are  no  free  states 
availaMe,  then  no  assignment  can  be  made  at  tnat  level. 
This  condition  is  called  a  fence. 

Since  the  searcn  mecnanism  always  Knows  tne 
level  from  which  it  is  doing  loot-anead  by  difference  set 
resolution,  it  is  able  to  perform  a  fixup  on  tne  label 
assignment  at  the  earliest  possible  time.  A  fixup  is 
accomplished  by  incrementing  tne  prefix  label  by  one.  If  an 
entire  row  in  the  failure  memory  becomes  valid  and  tnere  are 
no  free  states  available  a  fixup  must  be  performed  on  tne 
label  assignment  at  the  current  worfcine  level.  If  the  label 
is  left  the  same,  then  when  the  search  reaches  the  fenced 
level,  no  assignment  will  be  possible.  Each  time  a  fixup 
occurs,  all  entries  made  in  the  failure  memory  as  a  result 
of  the  previous  label  assignment  are  deleted,  and  entries 
are  then  made  based  on  tne  new  label, 
c.   Dynamic  Equivalence 

Couple-class  information  furnished  by  static 
processing  aids  in  the  determination  of  dynamic 
nonequivalence.  Dynamic  nonequivalence  can  occur  during  the 
synthesis  at  any  level  below  tne  current  wording  level  wnen 
the  couple-classes  are  equal.  Dynamic  equivalence  results 
wnen  instructions  in  the  same  couple-class  nave  been 
assigned  the  same  prefix  label.  Consider  Figure  25.  The 
I-C-I   triples  at  levels  5  and  5  and  at  levels  11  and  I'd   are 
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laeaj »  therefore,  tney  are  in  the  same  couple-ciass.  The 
instruction  'a'  at  level  5  nas  been  assigned  a  prefix  cf 
'2',  and  the  instruction  'a'  at  level  5  has  been  assigned  a 
prefix  of  'l'.  Now,  if  tne  instruction  at  level  11  is 
assigned  a  prefix  of  '2'  and  tne  instruction  at  level  12  is 
assigned  a  prefix  of  'l',  dynamic  equivalence  will  occur. 
Further,  the  assignment  at  level  12  will  be  forced.  Dynamic 
non-equivalence  results  when  such  an  assignment  scheme 
causes  non-determinism.  Dynamic  equivalence  checking 
functions  as  a  looK-anead  mechanism  by  preventing  tne  future 
occurrence  of  a  forced  assignment  which  will  result  in 
nondetermini sm.  Suppose  tne  syntnesizer  is  inspecting  tne 
trace  in  Figure  5,  and  has  Just  assigned  the  instruction  at 
level  6  a  prefix  of  'l'. 

Notice  that  level  12  is  in  tne  same  couple-ciass 
as  level  6.  Since  the  instruction  at  each  of  these  levels  is 
in  the  same  couple-ciass,  the  possibility  exists  tnat  they 
may  be  the  same  instruction.  If  the  instruction  at  level  11 
is  assigned  a  label  of  '2'  when  the  wording  level  reaches 
that  part  of  the  trace,  then  the  assignment  at  level  12  will 
be  a  forced  assignment  of  'l'.  However,  an  entry  nas  already 
been  made  in  tne  failure  memory  at  (12,1)  wnicn  indicates 
that  the  instruction  at  level  12  cannot  be  assigned  a  prefix 
label  of  1. 
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Figure  25.  Trace  TaDle/Fai lure  Memory 
In  order  to  avoid  tnls  contradiction  and  a 
bacfcup,  dynamic  nonequi valence  processing  causes  an  entry  at 
(11,2)  of  tne  failure  memory  wnicn  corresponds  to  tne 
labelling  of  '2'  given  to  tne  instruction  at  level  5.  Once 
tnis  is  accomplisned t  wnen  tne  wording  level  descends  to 
level  11,  an  assignment  of  '2'  cannot  be  made  and  as  a 
result,  tne  assignment  at  level  12  will  no  longer  be  ^orced 
by  dynamic  equivalence  whim  ^ives  tne  synthesizer  a  cnance 
to  try  otner  assignments  tnat  will  maintain  determinism  of 
tne  algorithm. 

Pseudo-assignment  conditions  and  fixup 
conditions  can  occur  in  the  failure  memory  as  a  result  of 
validation  of  all  but  one  of  tne  failure  memory  cells  in  a 
row  in  the  same  manner  that  they  occur  in  difference  set 
resolution.  Additionally,  dynamic  equivalency  and  difference 
set  resolution  can  interact  to  cause  failure  memory   entries 
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in  the  following  manner.  If  a  failure  memory  entry  is  made 
by  difference  set  resolution  at  any  level  wnicn  is  in  tne 
same  couple-class  as  a  level  previously  assigned  a  prefix 
label,  and  if  tne  failure  memory  entry  prevents  tne 
assignment  that  will  cause  tne  instructions  to  become  part 
of  tne  same  state,  then  dynamic  nonequivaience  will  result; 
therefore,  an  entry  must  be  made  in  the  failure  memory  to 
indicate  tnis  condition. 
3.   Bactcup/Fixup 

Tne  discussion  of  backup  and  fixup  conditions  nas 
been  saved  until  last.  The  basic  idea  behind  constructing 
tne  synthesizer  is  to  provide  as  mucn  information  as 
possible  to  the  search  mechanism,  and  thereby  direct  the 
label  assignment  witn  a  minimal  number  of  retries.  With  tnis 
in  mind  bactup  and  fixup  become  last  resorts. 

The  fixup  operation  attempts  to  resolve 
nondetermini sm  by  incrementing  the  label  at  tne  current 
worfcin?  level  wnen  a  contradiction  occurs.  If  the  newly 
incremented  label  is  not  a  legal  assignment  or  does  not 
correct  tne  contradiction,  tnen  backup  must  be  initiated. 
Tne  fixup  operation  cannot  be  attempted  if  tne  assignment  at 
the  current  worKine*  level  is  forced  or  if  the  assignment 
created  a  new  state.  In  either  of  tnese  cases,  a  fixup 
operation  would  leave  nondeterminism  in  the  alfforitnm. 

If  a  fixup  fails,  or  cannot  be  attempted,  backup  is 
initiated.  Eactcup  must  be  initiated  from  tne  current  wording 
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level  wnen  any  level  is  discovered  which  contains  one  of 
these  conditions: 

1)  Tne  label  assignment  is  forced  and  the  failure  memory 
cell  corresponding   to  mat  level  and  label  is  valid. 

2)  Tne  label  assignment  causes  a  contradiction  and 
represents  a  new  state,  or 

3)  There  is  no  free  state  available  for  tne  instruction 
at  a  particular  level,  and  ail  entries  in  tne  failure 
memory  row  at  that  level  are  valid. 

Tne  bactcup  begins  at  tne  current  working  level  regardless  cf 
which  level  triggered  the  mechanism,  and  continues  until 
none  of  the  three  conditions  given  above  are  present.  At 
tnat  level  a  fixup  operation  is  attempted  and  tne  searcn 
begins  anew.  Any  entries  into  tne  failure  memory  which  were 
caused  by  levels  greater  man  or  equal  to  tne  new  current 
working  level  are  invalidated  by  resetting  the  failure 
memory  entries  to  (0,0).  Additionally,  any  assignments  are 
deleted  along  with  their  side-effects,  such  as  annotations 
on  forced  assignments  and  new  states.  If  backup  rauses  the 
wording  level  to  be  decremented  to  zero,  a  free  state  is 
added  for  the  use  of  the  first  instruction  needing  more 
states  than  initially  allotted  as  tne  lower  bound. 
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III.   PREPROCESSOR 

A.   PRO£LEM  SPECIFICATION 

The  program  synthesizer  expects  a  set  of  triples  where 
each  triple  is  an  instruction,  a  condition,  and  an 
instruction.  Biermann  [2J  nas  shown  tnat  conditions 
inadvertently  or  purposely  omitted  bv  tne  user  -nay  he 
inserted  into  a  trace.  The  algorithm  for  insertion  of 
conditions  collects  tne  set  of  atoms  seen  on  the  transitions 
for  an  instruction.  An  aton  is  an  entity  whicn  nas  a  value 
of  either  'true'  or  'false'.  A  c  n  n  d  i  t, inn  is  composed  by 
logical  conjunction  and  disjunction  operations  on  atoms.  For 
example,  an  atom  may  be  'c  <=  £',  but  a  condition  may  te  'c 
<=0  and  a  =  4'.  A  set  of  mi  nt°rms  is  computed  from  the  set 
of  atoms  and  one  of  the  minterms  is  inserted  after  each 
occurrence  of  that  instruction  in  tne  trace.  If  la,b)  is  a 
set  of  atoms,  then  tne  set  of  nmterms  will  be 
{{a, b>, {-a, b>, {a,-b} ,{-a ,~b>>  where  -  stands  for  logical 
negation.  It  nas  been  shown  in  reference  [16 J  that  only  one 
of  the  minterms  can  be  true  for  each  occurrence  of  a 
transition  from  any  single  instruction. 

One  problem  witn  tne  algoritnm  is  tnat  it  is  incapable 
of  inserting  conditions  if  the  user  nas  failed  to  supply  any 
atons  after  a  particular  instruction.  For  example,  if  the 
user  should  specify  instruction  II  followed   by   instruction 
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12  in   one  part  of  tne  trace  ana  instruction  II  followed  try 

13  in  another  part  of  tne  trace,  but  tne  user  fails  to 
provide  a  condition  after  eitner  occurrence  of  II,  tnen  tne 
alfforitnm  will  be  unable  to  venerate  a  condition  for  II.  It 
is  assumed  tnat  II  does  not  appear  witn  an  atom  eisewnere  in 
tne  trace.  Tne  synthesizer  will  force  two  states  for  II  to 
resolve  any  nondeterminism .  This  mecnanism  is  fully 
explained  in  Section  II.  If  conditions  nad  been  supplied  in 
tne  above  example,  tne  difference  in  tne  two  programs  would 
be  tne  number  of  states  assigned  to  instruction  II.  Figure 
2b  snows  a  partial  computation  wit ft  out  explicitly  expressed 
conditions  along  witn  tne  associated  syntnesized  program 
fragment.  Figure  25  assumes  tnat  II  does  not  appear 
eisewnere  in  tne  trace.  Figure  27  is  a  representation  of  tne 
same  partial  computation  except  tnat  tne  conditions  cl  and 
c2  have  been  explicitly  expressed.  Tne  computations  in  botn 
figures  are  tne  same,  and  eacn  program  fragment  will 
correctly  execute  either  trace;  therefore,  the  programs  must 
be  equivalent  programs  with  respect  to  program  benavior. 
However  the  program  in  Figure  27  is  minimal  in  that  it 
contains  fewer  states  because  the  user  explicitly  supplied 
the  conditions. 
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(  Sf...tll,lct...  ,  1 1  ,  1 1   ,  •  .  •  H  / 
Example    Computation 

(T^>-^(iiy^2y^^  •  -i(n)->(i?)-> — >(7) 

Syntnesized  Program 
Figure  26.      Computation  witnout  Explicit  Conditions 
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Example  Computation 


Syntnesized  Program 
Figure  27 .   Computation  witn  Explicit  Conditions 

We  intend  to  snow  tnat  tnere  are  mecnanisms  which  ^an  be 
used  to  automatically  generate  tne  necessary  conditions  for 
tne  correct  synthesis  of  an  algorithm  produced  by  an  example 
computation  witnout  tne  user  explicitly  defining  tnem.  Tne 
problem  may  be  described  as  follows.  Given  an  example 
computation  witnout  explicitly  defined  conditions,  infer 
tnose  conditions  necessary  to  control  tne  flow  of 
computation  in  a  manner  such  tnat  tne  synthesized  program 
will   demonstrate  tne  benavior  desired  by  tne  user.  In  order 
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to  facilitate  the  solution  to  trie  problem,  a  condition  will 
be  viewed  as  a  function  tnat  returns  a  value  of  'true'  or 
'false'  when  called  rather  than  a  logical  operation  on 
atomic  boolean  entities.  Tne  problem  can  then  be  tncugnt  of 
as  constructing  a  function. 

Very  little  information  is  available  to  tne  current 
version  of  the  synthesizer  when  the  user  provides  only  a 
sequence  of  instructions.  Certainly  not  enougn  to  generate 
minimal  programs  as  described  in  Figure  27.  This  led  us  to 
search  for  other  sources  of  information  that  would  allow  us 
to  construct  tne  necessary  conditions.  We  soon  realized  that 
the  instructions  issued  by  the  user  do  not  exist  in  a 
vacuum.  These  instructions  manipulate  data.  If  tne  entire 
computer  memory,  including  registers,  is  viewed  as  tne 
domain  of  interest,  then  execution  of  an  instruction  always 
cnanges  tnis  state.  Intuitively,  tne  domain  also  reflects 
the  reason  that  the  user  decided  to  execute  a  particular 
instruction.  A  search  of  a  space  of  this  size  in  order  to 
determine  tne  reason  is  impractical;  however,  observing  only 
those  data  elements  affected  t^y  the  sequence  of  instructions 
can  often  be  quite  practical  and  can  significantly  reduce 
the  search  space. 

We  cnose  tne  text  editing  domain  as  the  domain  of 
interest  since  we  felt  that  it  would  be  sufficiently 
interesting  to  warrant  application  of  synthesis  techniques. 
This   domain   was   selected    because,    first,    tecnniaues 
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developed  in  this  iomain  may  he  general  enough  for  extension 
into  other  domains,  secondly,  the  world  for  tnis  domain  can 
be  described  as  the  set  of  all  characters  contained  in  a 
particular  text  file  wnicn  -nates  tne  world  finite,  and 
finally,  the  instruction  set  is  snail  enoue-h  to  he 
managea  ble . 

Although  our  primary  research  is  directed  toward 
studying  techniques  to  apply  to  automatic  condition 
generation,  we  feel  that  the  synthesizer  could  be  a  powerful 
text  editor  and  could  provide  some  useful  features  not 
normally  seen  in  conventional  text  editors.  Extended 
features  could  include  the  ability  to  capitalize  the  first 
letter  of  every  sentence,  the  ability  to  capitalize  all 
small  letters  in  tne  text,  the  ability  to  identify  a  string 
and  perform  some  operation  before,  after  or  on  it  ,  or  any 
combination  of  these  editing  actions. 

Tne  wording  nypotnesis  is  to  nave  tne  user  process  the 
text  file  in  a  normal  manner  and  have  the  synthesizer  infer 
a  program  from  his  actions.  Two  requirements  were  levied 
upon  tne  user.  Tne  first  requirement  on  tne  user  is  tnat  ne 
must  inform  the  synthesizer  when  ne  desires  to  have  a 
program  generated  so  tnat  tne  syntnesizer  can  begin 
monitoring  the  user's  actions.  A  great  deal  of  time  was 
spent  trying  to  figure  out  metnods  tnat  allowed  one  general 
mechanism  to  be  used  to  monitor  the  user's  actions  and  the 
resulting  changes  in  the   text   file.   Since   we   could   not 
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produce  such  a  mechanism,  a  second  reauirement  was  levied  on 
tne  user.  This  requirement  recognizes  a  Dasic  distinction 
between  two  different  aspects  of  text  editing:  context  free 
suosti  tutions,  and  context  sensitive  substitutions.  V<= 
define  a  context  free  environment  to  be  one  in  wnicn  tne 
character  to  be  operated  upon  is  not  dependent  on  characters 
around  it.  Capitalizing  all  occurrences  of  small  letters  is 
an  example  of  a  context  free  operation.  A  context  sensitive 
operation  is  defined  as  an  operation  in  wnicn  tne  action  to 
be  performed  on  a  cnaracter  or  sequence  of  characters 
depends  upon  otner  characters  around  tne  main  character  cf 
interest.  Capitalizing  the  first  letter  of  every  sentence  is 
a  context  sensitive  operation.  Condition  inference  in  a 
context  sensitive  environment  is  innerently  more  difficult 
than  in  a  context  free  environment  in  that  the  condition 
must  be  constructed  from  events  wnicn  require  a  loot-ahead. 
capability  not  inherent  in  the  synthesizer.  The  user  will  be 
free  to  switch  from  environment  to  environment  at  his 
convenience.  The  synthesizer  will  create  program  segments 
from  each  environment  which  can  be  used  to  construct  a 
complete  program  by  a  post-processor. 

B.   DESIGN  FOR  A  CONTEXT  FREE  ENVIRONMENT 
1 .   Overview 

Programs  tnat  operate  on  a  single  entity  can  be 
constructed  by  the  synthesizer.  Figure  28  snows  tne 
construction    of    a   program   from   a   trace   intended   to 
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communicate  that  the  letter  d  should  be  capitalized, 
wnerever  it  appears  in  tne  text  file.  The  column  labelled 
'trace'  contains  triples  of  the  form  instruction,  condition, 
instruction.  B  is  tne  start  instruction,  R  is  tne  mov»  right 
instruction,  C  is  tne  capitalize  or  change  instruction  and  5 
is  the  stop  instruction,  respectively.  The  conditions  for 
tnis  trace  are  tne  cnaracters  seen  in  tne  text  file  prior  to 
the  execution  of  the  second  instruction  in  each  triple.  The 
special  condition  "0"  is  tne  null  condition,  and  is  always 
inserted  after  the  start  instruction. 

Tne  generated  program  will  correctly  execute  the 
trace  that  was  used  to  construct  it,  and  by  examination  of 
the  program  it  can  be  snown  that  the  program  will  convert 
all  d's  to  D's  in  a  text  file  consisting  of  tne  cnaracters 
A,  b,  C,  d,  F  and  G .  There  are  no  arcs  available  for  other 
cnaracters  in  tne  cnaracter  set.  In  order  to  generate  a 
program  to  perform  tne  same  function  on  an  arbitrary  text 
file,  tne  user  would  be  forced  to  give  an  example  of  tne 
desired  transition  for  every  cnaracter  in  tne  character  set. 

Since  it  is  desirable  to  relieve  tne  user  of  tne 
chore  of  providing  an  inordinate  number  of  examples  in  order 
to  completely  specify  tne  function,  a  method  is  required 
that  utilizes  a  few  examples  of  the  types  of  conditions  that 
are  to  appear  on  tne  arcs  to  generalize  tne  conditions  into 
a  more  compact  and  complete  form.  If  a  generalization  can  be 
found,  the  multiple  arcs  may  be  replaced  with  a  more  general 
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condition  and,  therefore,  correct  proerams  can  De  created 
witn  fewer  examples .  However  the  combination  of  arcs  between 
nodes  must  be  accomplished  so  that  determinism  is  maintained 
or  the  synthesizer  will  not  create  a  mimimum  state  macnine 
capable  of  performing  tne  desired  function.  Tnat  means  tnat 
tne  generalization  technique  must  be  able  to  handle 
conflicts  properly.  Tne  arcs  in  Firnre  29  tnat  originate  at 
state  R  and  tsrminat^  at  state  R  appear  to  consist  of 
elements  from  tn°  capital  letters  and  small  letters.  Tne 
generalization  of  {x!  x  €  capital  letters)  U  {z!  i  €  snail 
letters)  would  appear  to  be  a  reasonable  replacement  for  all 
of  tne  R  to  R  arcs.  If  tnis  generalization  was  made  a 
conflict  would  result  because  tbe  letter  'd'  is  also  an 
element  of  tne  {zj  z  £   small  letters}. 

Trace  Synthesized  program 


Figure  28.   Synthesizer  Action 

2.   Structure  of  tne  Condition  Preprocessor 

Tne  preprocessor  is  designed  to  accumulate  knowledge 
from  the  traces  it  is  provided,  then  use  the  Knowledge  to 
construct  meaningful  conditions.  The  preprocessor  scans  the 
input  trace  looking  at  tne  instructions  and  cnaracters   tnat 
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are  seen  before  the  instructions.  This  pnase  extracts  pairs 
of  instructions  from  tne  trace.  Tne  trace  in  Figure  28  would 
nave  tne  instruction  pairs  (E,R),  (R,R),  (R»C^  and  (C,P) 
extracted.  Attached  to  eacn  of  these  pairs  is  the  set  cf 
characters  that  were  seen  between  the  pair.  Tne  preprocessor 
then  analyzes  the  information  to  determine  if  a 
generalization  can  be  maae  from  tne  set  of  cnaracters 
associated  with  eacn  instruction  pair. 

Tne  natural  division  mentioned  above  allows  the 
preprocessor  to  be  divided  into  two  modules.  The  first 
module  performs  tne  scanning  function  wnile  tne  second 
module  analyzes  tne  information  and  anplies  a  heuristic  to 
provide  tne  most  general  condition  possible.  The 
implementation  of  the  preprocessor  will  be  discussed  later, 
but  before  it  can  be  discussed  an  explanation  of  the  data 
structures  required  by  the  preprocessor  is  needed. 
3.   Preprocessor  Data  Structures 

To  simplify  tne  problem  we  define  two  tvpes  of 
instructions  in  this  domain.  Instructions  that  specify  the 
current    location   of    interest   are   cursor   oositionin^ 

1  ■       ■     ii      ■■   JM   i     ■  ■■*■ 

instructions .  Instructions  that  change  tne  state  of  the 
donain  are  data  manipulation  instructions.  Tne  preprocessor 
accepts  as  input  a  sequence  of  instructions  and  an 
associated  sequence  of  cnaracters.  Tne  first  instruction  in 
the  instruction  sequence  is  always  the  start  instruction 
which  does  not  nave  a  character  associated  with  it.  The  last 
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instruction  in  tne  sequence  is  always  a  nait  instruction. 
Every  action  performed  by  the  user  is  raptured  and  appended 
to  tne  Instruction  sequence  list.  Tne  cnaracter  seauence  is 
created  in  Harmony  with  the  instruction  seauence.  In  the 
quiescent  state  tne  cursor  will  indicate  a  certain  position 
in  the  text.  When  the  user  performs  some  action  sucn  as  move 
the  cursor  rigtit,  a  monitor  picfcs  up  tne  value  in  tne  old 
position  and  associates  tnat  value  with  the  instruction 
executed  hy  tne  user.  For  example,  assume  a  user  nas  a  text 
file  in  lower  case  letters  tnat  ne  wants  to  change  to  all 
upper  case  letters.  Tne  user  initiates  the  synthesizer  then 
proceeds  across  tne  line  of  text  changing  lower  case  letters 
to  upper  case  letters.  For  the  purpose  of  this  example, 
assume  tne  line  of  text  is  "change  lower  case  to  upper 
case".  As  the  user  moves  across  the  line  matins- 
substitutions,  tne  condition  monitor  captures  the  actions 
performed  and  the  characters  seen.  The  example  line  would 
yield  an  instruction  sequence  of  (E,  C,  P.,  C,  R,  C,  P.,  C, 
...,  C,  S).  Tne  associated  cnaracter  sequence  would  be;  (c, 
C,  h,  S,  a.  A,  ...,  e,  0).  The  "c"  and  "R"  in  tne 
instruction  sequence  are  the  capitalize  and  move  rignt 
instruction,  respectively.  Note  that  tne  capitalize 
instruction  does  not  reposition  the  cursor  and  wnen  tne  user 
moves  tne  cursor  to  tne  right,  tne  result  of  tne  capitalize 
instruction  is  associated  with  the  move. 
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Anotner  lata  structure  needed  by  tne  preprocessor  is 
the  fl S C 1 1  v  °  r  t  n  r .  The  ASCII  vector  is  a  128-byte  linear 
array  with  indices  numbered  0  tnrou^n  127.  Eacn  byte  in  tne 
array  is  referenced  by  the  decimal  value  of  a  particular 
ASCII  character.  For  example,  tne  array  element  reserved  for 
tne  ASCII  cnaracter  '{?'  is  indexed  by  4-8  decimal.  Tne  array 
element  reserved  for  the  ASCII  character  'a'  is  indexed  by 
66  decimal.  The  vector  defines  a  partition  of  tne  ASCII 
character  set  by  using-  the  following  technique.  The  ASCII 
character  .set  has  been  divided  into  eight  mutually  exclusive 
subsets . 

Subset  0     Capital  letters 

Subset  1     Small  letters 

Subset  2     Numbers 

Subset  3     space  character  <sp> 

Subset  4     Symbols 

Subset  5     Punctuation 

Subset  5     Arithmetic  operators 

Subset  7     Control  characters 

The  subset  name  is  entered  into  the  ASCII  vector  at  eacn 
cell  by  converting  the  ASCII  cnaracter  to  its  decimal 
equivalent  and  using  tnat  value  as  tne  array  index.  Tne 
default  partition  is  shown  in  Figure  29. 
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ASCII        0   1  ...   9         &   B  ...   Z 

Figure  29.  ASCII  Vector 
Trie  cnaracter  set  nierarcny  is  defined  by  tne  tree 
structure  in  Figure  30.  Tne  tree  is  related  to  the  ASCII 
vector  tnrougn  tne  cnaracter  subset  names  contained  on  eacn 
node  one  level  above  tne  leaf  nodes.  For  tne  default 
nierarcny  snown  in  Figure  30,  a  zero  would  be  entered  in  tne 
ASCII  vector  for  all  capital  letters,  and  a  1  would  be 
entered  for  all  small  letters.  If  a  different  partition  of 
the  character  set  is  required  the  user  can  modify  the 
hierarchy  or  create  nis  own.  An  example  will  be  given  to 
explain  how  tne  modification  may  be  accomplished.  Assume  a 
partition  is  desired  wnere  tne  vowels  are  isolated  into  a 
set.  Assume  furtner  tnat  tne  tne  vowels  are  to  be  subdivided 
into  capital  vowels  and  small  vowels.  The  hierarchy  would  be 
modified  bv  placing  a  son  called  'vowels'  on  tne  alpnabetic 
node.  Attach  to  the  new  node  two  sons,  '-ailed  'Cap-vowels' 
and  'Small-vowels',  with  arcs  to  tne  appropriate  characters. 
Relabel  tne  nierarcny  so  tnat  sibling  relations  are  numbered 
in  increasing  order.  Finally,  initialize  tne  ASCII  vector 
witn  the  new  labelling.  All  of  tne  modifications  can  be  done 
by  the  system  when  the  user  calls  for  the  modification  The 
modified  hierarchy  is  shown  in  Figure  31. 
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Figure  31.  Yoait'iea  Hierarcny 
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Tne  next  lata  structure  used  py  trie  preprocessor  is 
the  transition  table.  Tne  transition  table  contains  tne 
Knowledge  gleaned  from  scanning  tne  instruction  sequence  and 
tne  cnaracter  sequence  created  by  tne  monitor.  Figure  32 
snows  tne  format  of  tne  transition  table.  Tne  transition 
table  is  an  array  of  records  witn  eacn  record  containing 
information  on  a  transition.  In  tne  table,  II  and  12  are 
instructions  wnere  12  directly  follows  II  in  at  least  one 
place  in  tne  instruction  sequence.  'Active-sets'  is  a  field 
that  contains  information  on  sets  of  cnaracters  that  nave 
been  observed  by  tne  monitor  on  tne  transition  fror  II  to 
12.  The  fields  'Set-l'  tnrougn  'Set-n'  contain  tne  value  for 

set  name,  tne  count  of  the  elements  from  tne  set  associated 
witn  tne  transition  and  a  pointer  to  a  linked  list  of  tne 
elements.  The  records  that  would  be  created  for  the  trace 
given  in  Figure  2B  would  be  associated  witn  tne  transitions 
B    to  R,  R  to  R,  R  to  C,  C  to  R  and  R  to  S. 


!  II  1  12  !  Active-sets  !  Set-l  !  Set-2  ! 


Set-n 


i       •  i 

i                   •  i 

i  t 

i                   •  i 

Figure  32.  Format  of  tne  Transition  Table 

4.  Implementation 

The   context   free  preprocessor  consist  of  two  main 

modules;  the  scanner   and  the   insertion   modules.   Anotner 

important   module   not   part  of  tne  preprocessor  is  the  user 
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monitor.  Tne  monitor  earners  tne  actions  of  tne  user  and 
creates  two  arrays.  One  array  contains  tne  sequence  of 
instructions  tne  user  provictel  and  tne  otner  contains 
information  of  what  was  true  before  an  instruction  was 
executed.  Tne  information  tnat  is  gatnered  is  tnen  passed  to 
tne  appropriate  preprocessor. 

Tne  example  instruction  and  character  sequences 
ffiven  in  Figure  33  will  be  tne  example  used  to  explain  tne 
mecnanism  of  tne  preprocessor.  Figure  33  is  illustrative  of 
a  collection  of  actions  tnat  were  performed  by  some  user. 
Tne  user's  eoal  is:  Change  all  lower  case  letters  in  a  text 
file  into  upper  case  letters.  Tne  user  nas  activated  tne 
condition  monitor,  positioned  tne  cursor  at  tne  beginning  of 
a  line  of  text  and  moved  right  along  tne  line,  cnan?ine  the 
lower  case  letters  to  upper  case  wnenever  one  appeared  above 
the  cursor.  Fieure  33  is  an  example  of  output  from  tne 
monitor  assuming  tne  line  tne  user  processed  was  "Tne 
numbers  1,  2,  3,  b,  7  ARE  prime.".  Tne  first  column  in 
Figure  33  is  tne  character  array.  It  contains  the  character 
under  tne  cursor  prior  to  execution  of  tne  instruction  in 
column  two.  Column  two  is  a  trace  of  the  actions  performed 
by  the  user.  The  "r"  represents  tne  "move  cursor  rignt" 
instruction  and  tne  "c"  represents  a  cnange  without  cursor 
reposition  instruction.  Figure  33  can  be  read  as:  The 
character  in  column  one  was  observed  and  tne  instruction  in 
column  two  was  executed. 
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Figure  33.  Monitor  Output 
The  scan  module  of  tne  preprocessor  is  activated 
wnen  tne  user  indicates  tne  representative  example  is 
complete.  Let  'inst-inlex'  De  an  index  for  tne  instruction 
array  tnat  is  initialized  to  1.  Tne  first  step  is  to  create 
a  transition  from  tne  start  instruction  to  tne  first 
instruction  in  tne  instruction  array  and  add  tne  transition 
to  tne  transition  table.  Tnis  transition  will  indicate  tne 
besrinnin?  of  tne  program  and  will  transition  to  tne  first 
instruction  provided  on  a  null  condition.  Tne  module  tnen 
moves  down  the  instruction  array  creatine  other  transitions 
and  adding  tnem   to   tne   transition   table.   Duplicate 
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transitions  will  not  appear  in  tne  table.  A  transi  tion  is 
defined  as  a  pair  (11,12),  II  and  12  are  instructions  and  12 
follows  II  witnin  tne  instruction  array.  Tne  instruction 
array  in  Fi?ure  33  yields  transitions  (R,C),  (C,fl),  (R,R). 

Trie  transitions  are  constructed  by  indexing  tnrough 
tne  instruction  array.  The  instruction  at  inst-index  and 
inst-index  +  1  form  a  transition.  Tne  transition  is  tne 
matcn  against  tne  transition  table.  If  a  matcn  occurs*  tne 
character  in  tne  character  array  at  inst-index  +  1  is 
extracted  and  its  ASCII  value  is  used  to  index  into  tne 
ASCII  vector.  The  value  st<jred  in  tne  ASCII  vector  is  used 
as  an  exponent  for  two  and  stored  in  a  temporary  variable.  A 
bit  by  bit  logical  OR  is  perfomed  between  the  temporary 
variable  and  tne  Active-sets  variable  for  the  transition  and 
tne  result  is  stored  in  Active-sets.  Active-sets  contains 
the  information  of  every  set  from  the  partition  that  has 
elements  seen  on  tne  transition.  Tne  operation  described 
above  allocates  one  bit  for  eacn  set  in  tne  partition.  If 
Active-sets  equals  1  then  bit  one  of  Active-sets  is  a  1 
signifying  at  least  one  element  of  set  1  nas  been  seen  en 
this  transition.  A  two  would  signify  tnat  some  element  of 
set  two  had  been  seen  and  a  three  would  signify  tnat  some 
element  of  set  one  and  some  element  of  set  two  nad  been 
seen. 

In  tne  transition  table  are  fields  for  each  set  that 
has  been  determined  to  be  active  for  tne  transition.   Within 
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eacn  of  trie  set  fields  mere  are  tnree  subfields,  tne  first 
is  the  set  name,  the  second  is  a  count  of  the  elements  seen 
for  tne  set  and  tne  last  is  a  pointer  to  tne  start  of  a 
circularly  United  list  containing  tne  elements  used  from  tne 
set.  The  value  tnat  was  obtained  from  tne  ASCII  vector  is 
used  as  a  set  name  and  matcned  against  eacn  of  tne  set 
fields'  set  name.  If  the  set  name  matches  an  entry  tne 
character  at  inst-index  +  1  is  added  to  tne  linked  list  in 
lexicographical  order  if  not  already  on  tne  list  and  tne 
count  is  incremented  by  one.  If  a  matcn  does  not  occur  on 
tne  set  name  a  new  set  field  is  created  and  driven  tne  name 
that  was  obtained  from  the  ASCII  vector,  the  count  is  set  to 
one,  and  tne  cnaracter  is  put  on  tne  list. 

When  the  scan  module  reaches  the  end  of  the  input, 
tne  transition  table  contains  an  entry  for  each  transition 
that  was  seen.  Each  transition  is  associated  with  all  tne 
sets  tnat  nad  elements  seen  with  tne  transition.  Finally 
each  transition  is  associated  witn  tne  actual  elements 
througn  tne  linked  list  for  each  set.  The  information  is 
tnen  passed  to  the  insertion  module  for  analysis.  Figure  34 
shows  the  completed  transition  table  and  the  linked  list  of 
elements  for  eacn  set. 

Once  a  completed  transition  table  has  been  created, 
control  is  passed  to  tne  insertion  module.  Tne  insertion 
module  processes  the  information  in  the  transition  table  and 
assigns  a  condition  for  eacn  transition. 
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NOTE:  The  notation  <1>,  <2>,  etc.  represents  a  pointer  to 
the  linked  list  headed  by  the  same  symbol. 

Figure  34.  Completed  Transition  Table 
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Trie  Active-sets  entries  provide  an  efficient 
mecnanism  for  recognizing  potential  conflicts  on  emanating 
arcs.  Performing  a  tit  by  bit  AND  on  tne  Active-sets  entries 
that  nave  a  common  originating  intruction  yields  tne  source 
of  conflicts.  Tne  bit  positions  tnat  are  on  (bit  equals  l) 
are  tne  set  (or  sets)  tnat  nave  had  elements  on  multiple 
transitions.  For  example,  let  (11,12)  and  (11,13)  oe  entries 
in  tne  transition  table  with  Active-sets  value  of  five  (01^1 
binary)  and  three  (0011  binary)  respectively.  Let  0  equal 
tne  result  of  tne  bit  Qy  bit  AND  of  tne  Active-sets  values 
given  above  (i.e.  0001).  0  indicates  that  there  is  a 
conflict  between  tne  transition  (11,12)  and  the  transition 
(11,13).  Furthermore,  Q  indicates  that  the  set  causing  tne 
conflict  is  labelled  zero  in  tne  nierarcny  of  Figure  30 
because  tne  on  bit  is  in  tne  right  most  position  wnicn 
corresponds  to  two  raised  to  tne  zero  exponent.  Usin,?  the 
exponent  to  enter  tne  nierarcny,  it  can  be  determined  tnat 
capital  letters  were  seen  on  both  transitions.  Once  all  the 
conflicts  for  transitions  with  the  same  originating 
intruction  are  Known,  the  conflicts  must  be  resolved  before 
an  assignment  of  conditions  can  be  made. 

Extending  tne  example  given  aoove,  assume  tnat  eignt 
capital  letters  were  seen  on  transition  (11,12)  and  four 
capital  letters  were  seen  on  tne  transition  (11,13).  A 
partial  condition  can  be  constructed  for  the  transition 
(11,12)   as   a   set   difference   between   tne  set  of  capital 
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letters  and  the  actual  elements  seen  on  tne  transition 
(11,13).  Tne  partial  condition  for  tne  (11,13)  transition 
becomes  tne  set  of  capital  letters  tnat  were  actually  seen 
with  this  transition.  The  initial  conditions  for  these 
transitions  become  tne  union  of  tne  sets  indicated  in 
Active-sets  as  not  being  in  conflict  and  tne  sets  created  by 
tne  resolution  of  conficts.  Tnerefore,  tne  condition  for 
(11,12)  is  ({  x  !  x  e  capital  letters}  -  {x|x  c  capital 
letters  on  otner  transitions})  U  {xjx  e  numeric},  and  the 
condition  for  (11,13)  becomes  {  z  j  z  c  ({actual  capital 
letters  seen}  U  {small  letters})}.  In  tnis  example,  it  was 
assumed  tnat  tne  sets,  numeric  and  small  letters,  were  an 
appropriate  generalization  for  the  transition.  In  practice 
it  cannot  be  done  without  consideration  of  the  number  of 
elements  that  have  been  seen  from  the  set  on  the  transition. 
If  the  count  field  for  the  set  exceeds  a  tnresnold  value  for 
the  set,  the  generalization  may  be  made,  otnerwise  tne 
elements  tnemselves  become  the  partial  condition  for  tne 
transition. 

After  a  condition  nas  been  constructed  for  a 
transition,  a  final  strong  generalization  technique  is 
employed.  The  Active-sets  value  for  the  transition  again 
supplies  the  starting  point  for  tnis  tecnnique.  Notice 
adjacent  bits  in  Active-sets  correspond  to  adjacent  nodes  in 
tne  nierarcny.  Tnerefore,  a  cnecK  is  made  of  tne  Active-sets 
to   see   if   it  has  adjacent  bits  with  a  value  of  one.  If  it 
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does  then  a  generalization  may  be  attempted.  Assume  tne 
condition  (({capital  letters}  {A  E  I  0  U}1  U  Ismail 
letters}  (J  {numeric})  nas  been  constructed  for  some 
transition.  Tne  Active-sets  value  for  tnis  transition  must 
be  seven  (0111  binary),  tfitn  tne  default  nierarcny  in  Figure 
32,  a  generalization  to  Alpnabetic  and  tnen  to  Alpna-numeric 
would  be  attempted.  Notice  tnat  a  generalization  to 
Alpna-numeri c  would  fail  because  of  a  conflict  witn  anotner 
transition.  Intuitively  ( {alpna-numeri c}  -  {A,  E,  I,  0,  U}  ) 
would  be  a  correct  cnoice  for  tne  condition  for  tnis 
transition.  A  general  procedure  for  tne  construction  of 
generalized  conditions  is  given  below. 

A  set  of  nodes  Y  =  {yf  ,  y2  ,  . ..,  yn  }  is 
seneralizable  to  a  node  X  if  tne  set  of  node  1  form  a 
complete  and  exnaustive  set  of  leaves  to  tne  subtree  rooted 
at  X.  Furtner,  a  set  of  nodes  Z  =  {z,  ,   z.  ,   ...»   zm  }   is 


generalizable  to  the  set  V  =  {w(  ,  w2  ,  ...  ,v.  },  j  <  m,  wnere 


eacn  w  is  a  generalization  of  a  subset  Z. 


IF  the  condition  =  F,  U  F2  0  .  .  .  U  Fn 
where  Fj  =  z j  -  q(-  ,  i  =  l,n 

where  qj  C  z^  (q^  possibly  null) 


THEN 


tne  condition  is  set  to  W  -  U    q; 
wnere  W  is  tne  smallest  set 

W  =  1  Wj  ,   W^  f   •  •  •   t   Wj  J 

sucn  tnat  W  generalizes  {zf  ,  z .  ♦  ...  .  zn} 
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C.   DESIGN  FOR  A  CONTEXT  SENSITIVE  ENVIRONMENT 
1 .   Overvi ew 

Condition    generation   in   trie   context   sensitive 

environment  is  a  more  difficult  tasfc  tnan  in  tne  context 
free  environment.  Tnis  difficulty  arises  from  tne  scope  of 
Knowledge  required  to  maice  decisions  on  wnat  a  condition  is 
to  be.  Tne  conditions  tnemselves  are  more  complex  because 
tney  depend  not  only  on  tne  cnaracter  tnat  is  being  seen, 
but  also  depend  on  cnaracters  tnat  precede  and  follow  tne 
current  cnaracter  under  consideration.  Tne  following  example 
will  be  used  to  illustrate  tne  difficulties  and  our  solution 
to  tnis  problem.  Assume  a  user  wisnes  to  capitalize  all 
occurrences  of  tne  word  'time'  in  some  text  file.  Also 
assume  tnat  tne  word  occurs  at  tne  besinnin^,  at  tne  end, 
and  in  tne  middle  of  sentences  in  tne  text  file.  Tne 
question  is  now  to  construct  a  program  tnat  performs  tne 
desired  function  given  only  tne  actions  tne  user  performs  as 
an  example  of  tne  required  program. 

Tne  assumption  about  tne  position  of  tne  word  'time' 
in  tne  text  file  implies  tnat  tne  requested  action  needs  to 
be  accompli  sued  on  strings  tnat  nave  very  different 
cnaracteristics .  Certainly,  botn  'time'  and  'Time'  snould  be 
capitalized  as  snould  'time,'  ,  'time?'  and  'time<sp>'.  On 
tne  otner  nand  tne  string  'time'  stiouid  not  be  capitalized 
wnen  it  occurs  witnin  a  word  lise  'sometime'  or  'timely'. 
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Any  generated  program  that  behaves  as  described 
above  must  be  able  to  recognize  an  occurrence  of  tne  string 
or  some  variation  of  tne  string.  Tne  totality  of  tnis 
information  must  be  ^lued  togetner  to  provide  a  single 
condition  tnat  is  descriptive  of  wnat  tne  surrounding 
environment  must  be  lifce  before  tne  action  is  performed.  Tne 
implication  is  tnat  tne  condition  itself  must  be  acie  to 
perform  ejecting  and  loos-ahead.  In  otner  words,  the 
condition  for  tne  transition  to  tne  operation  must  in  fact 
be  a  procedure  which  responds  'true'  whenever  tne  strin?  of 
interest  is  recognized.  Assume  for  tne  present  tnat  tie 
string  of  interest  can  be  discerned  from  tne  user's  actions, 
(a  nard  problem  by  itself,  see  Angiuin  [19J  )  one  must  wonder 
now  sucn  a  procedure  can  be  constructed  and  tnen  inserted 
into  tne  generated  program  wnicn  performs  tne  function  of  a 
condition  on  some  transition  in  tne  program.  Figure  35  snows 
a  procedure  which  recognizes  tne  word  'time'.  Note  tne 
robustness  of  the  procedure  in  tnat  it  distinguishes  between 
the  differing  occurrences  of  'time'  as  mentioned  above. 
Figure  35  points  out  that  tne  problem  is  not  just  ^eneratin? 
a  procedure  as  a  condition  but  also  generating  conditions 
within  the  procedure  that  is  to  be  the  overall  condition. 
Tne  arcs  labeled  'T  v  t '  and  '<SP>  v  {punctuation}'  snould 
be  noted  with  interest  because  they  provide  the  robustness 
tne  condition  procedure  needs.  Tne  discovery  of  arc  labels 
for  the  condition  procedure  will  be  discussed  next. 
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{{ASCII)  -  Ksp>H 


<sp> 


(T    v    t) 


(<sp>  v    {Punc.}  ) 


(<sp>  v    {Punc.} ) 


'Requested ! 
i  Operati  on ! 


Figure   35.    Condition    for   "time"   and    "Time 
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2.   I  implement  at  1  on 

Tne  monitoring  of  user  actions  provides  the 
instruction  and  cnaracter  sequence  in  tne  same  manner  as 
done  in  tne  context  free  mode.  A  consideration  was  given  to 
require  more  information  be  provided  by  tne  monitor, 
nowever,  tne  notion  was  discarded  because  it  would  reauire 
the  user  to  be  aware  of  tne  functioning  of  tne  preprocessor. 
Requiring  tne  user  to  provide  information  to  tne  system 
would  betray  our  goal  for  tne  system.  Tne  user  snould  only 
be  required  to  initiate  tne  system  and  tnen  perform  editing 
as  if  tne  system  was  not  actively  monitoring  nis  actions.  We 
feel  tne  requirement  of  specifying  wnetner  tne  user  wants  to 
perform  context  free  or  context  sensitive  operations  is  tne 
maximum  tnat  snould  be  as£ed.  If  it  were  feasible  to 
recognize  tne  difference  between  tne  two  modes  from  tne 
user's  actions  alone,  tnis  limitation  would  be  also  removed. 

Given  only  tne  instruction  sequence,  tne  cnaracter 
sequence,  and  the  information  of  a  context  sensitive 
environment,  tne  first  assignment  of  tne  context  sensitive 
preprocessor  is  to  discern  tne  string  of  characters  upon 
which  some  operation  is  to  be  performed.  Tnis  is  a  pattern 
recognition  problem  of  considerable  difficulty.  Angluin  [19J 
provides  the  following  theorem,  "There  is  an  effective 
procedure  wnicn,  wnen  given  a  sample  S  as  input,  outputs  a 
pattern  p  whicn  is  descriptive  of  Si".  The  sample  S  is  a 
subset   of   tne   set  of  all  strings  over  the  alpnabet  of  tne 
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language.  The  effective  proceiure  is  computationally 
expensive  dnct  not  iTiplenentationally  desirable  for  our 
system.  The  procedure  is  an  enumeration  tecnnique  on 
patterns  with  a  length  less  than  trie  shortest  example  in  t^e 
sanple  set  S.  Eacn  of  tne  enumerated  patterns  is  tested  to 
determine  if  it  is  descriptive  of  the  entire  set  S.  Tne 
longest  pattern  that  is  descriptive  of  S  is  the  most 
specific  pattern  for  the  set.  Clearly,  as  tne  length  of  tne 
of  the  sample  grows,  tne  number  of  enumerated  patterns  will 
grow  exponentially.  Angluin  [19J  states,  "in  tne  general 
case,  the  test  performed  on  the  patterns  is  an  NP-complete 
problem.".  The  test  she  is  referring  to  is  the  cnecfc  to  see 
if  the  enumerated  pattern  is  descriptive  of  S. 

For  implementation  purposes,  we  need  a  mechanism 
that  falls  well  snort  of  tne  exponential  benavior  of  tne 
effective  procedure  mentioned  above.  The  text  editing  domain 
has  two  types  of  instructions  for  the  purpose  of  this  paper. 
The  first  type  of  instruction  will  be  called  cursor 
positioning  instructions  wnile  tne  second  type  will  re 
called  data  manipulating  instructions.  Assuming-  the  text 
file  is  to  be  represented  as  a  linear  array,  only  one  cursor 
position  instruction  need  concern  us.  All  cursor  positioning 
commands  such  as  move  left,  move  up  or  move  down  can  be 
represented  as  move  right  instructions.  Data  manipulation 
instructions  operate  on  one  character  and  do  not  reposition 
tne  cursor. 
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Trie  metnod  we  nave  adopted  for  determining  the 
string  of  interest  and  tne  context  of  tne  string  is  based  on 
tne  above  definition  of  tne  types  of  instructions  available 
in  tne  text  editing  dotiain.  Tne  preprocessor  scans  tne 
instruction  sequence  looKing  for  an  occurrence  of  a  data 
manipulation  instruction.  Tne  character  associated  with  tnis 
instruction  is  tnen  taiten  as  tne  first  cnaracter  of  tne 
string  of  interest.  Otner  cnaracters  are  added  to  tne  string 
by  continuing  tne  scan  until  multiple  occurrences  of  cursor 
positioning  instructions  are  encountered.  A  nypotnesis  is 
tnen  constructed  consisting  of  tnree  parts.  Tne  first  part 
is  tne  beginning  context.  It  is  constructed  from  tne 
characters  tnat  preceded  tne  string  in  tne  cnaracter 
sequence.  Tne  second  part  is  tne  string  itself  and  tne  final 
part  is  tne  ending  context  constructed  from  tne  characters 
seen  after  tne  string.  For  engineering  considerations,  tne 
number  of  characters  in  the  beginning  and  ending  context 
will  be  limited  to  twenty  characters.  Tne  probability  of  tne 
context  exceeding  twenty  cnaracters  on  botn  sides  of  tne 
string  in  tne  text  editing  domain  is  small  enough  to  ignore. 

Once  a  nypotnesis  is  proposed  it  is  set  aside  as  an 
active  hypothesis  and  scanning  of  tne  input  continues.  Otner 
cases  of  data  manipulation  instructions  surrounded  by  cursor 
positioning  instructions  will  result  in  otner  nypotnesis 
being  constructed.  As  these  hypothesis  are  added  to  the 
active  nypotnesis  list  tney  are  cheesed  for  consistency   and 
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if  the  new  hypothesis  causes  conflicts  they  are  resolved,  by 
constructing  anotner  nypotnesis  from  the  conflicting 
hypothesis.  To  demonstrate  tnis  mechanism  we  present  an 
example  which  will  illustrate  the  generation  of  hypotheses 
and  resolution  into  a  condition  function.  Tne  example  used 
is  the  construction  of  the  function  which  will  recognize  the 
string  'time'. 

Suppose   the   text   file   contained   the    following" 
sentences  sornewnere  in  tne  file. 


The  time  is  two  oclocfc. 
It  is  time  to  go  to  tea. 
Time  the  runner. 
Did  you  run  out  of  time? 


Also,  suppose  tne  user  nas  specified  the  environment  is  to 
be  context  sensitive  and  has  bee-un  to  perform  actions  on  tne 
file.  The  -nonitor  could  create  tne  following  instruction  and 
cnaracter  sequence  fragments  from  tne  user  moving  tnrougn 
tne  text  file  and  capitalizing  these  occurrences  of  'time'. 


(RRRRCRCRCRCRRRR 
(The  tTilmMeS  is 

(RRRRRRCRCRCRCRRRR 
(It  is  tTilmMeE  to 

(RCRCRCRRRRR  .. 
(TilmMeE  tne  . . 


) 

.) 
.) 


(...  RRRRRRRRRRRCRCRCRCRR) 
(...  run  out  of  tTilmMeE?) 
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This   example   is   not   to   imply   tne   user  must  change  all 

occurrences  in  tne  text  file  but  ne   snould   provide   enougn 

examples  from  tne  file  to  insure  lis  desires  are  understood. 

If   tne   user  nas   not   supplied   a   dis tinguisning   set  of 

examples  ani  an  incorrect  program  is  venerated  ne  may  add  to 

tne  set  of  examples. 

Scanning  the  first  instruction   sequence  until   tne 

first   data   manipulation   instruction  results  in  tne  string 

'time'  beine  constructed.  Tne  resulting  nypotnesis   is   tnat 

the   string   'time'   is   within  tne  context  of  'Tne<sp>'  and 

'<sp>  is  two  ocIock.'.  Tne  nypotnesis  may  be  viewed   as   tne 

following  data  structure. 

Hypotnesis  l: 

Begin  context:  Tne<sp> 
String:  time 
End  context:  <sp>is  two  oclocn. 

A   second  hypothesis  would  be  venerated  for  the  next  portion 

of  tne  instruction  sequence  as  snown  below. 

Hypothesis  2: 

3egin  context:  It  is<sp> 

String:  time 

End  context:  <sp>to  ?o  to  bed. 

A  comparison  of  these   two   nypotneses   indicates   a 

disagreement   between  the  contexts.  The  conflict  is  resolved 

by  determining  the  longest  beginning  and  ending  context  that 

agree  between  tne  two  nypotneses  and  generate   a  nypotnesis 

reflective   of   this  agreement.  By  wording  backward  from  the 

last  character  in  tne  begin  context  for  botn  hypotheses,   it 

is    possible   to   ascertain   tnat   the   only   character   in 
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agreement  is  the   space.   Wonting   forward   from   trie   first 

character  in  tne  end  context  for  botn  nypotneses,  again  only 

character  in  agreement  is  tne  tne  space.  A  tnird  nypotnesis 

with  tne  new  begin  and  end  contexts  is  generated  as  follows: 

Hypotnesis  3: 

Begin  context:  <sp> 

String:  time 

End  context:  <sp> 

This  nypotnesis  specifies  tnat  tne  string  'time' 
must  be  preceded  and  followed  by  a  space.  Note  tne  test  of 
the  hypothesis  implies  tne  user  is  allowed  to  specify  one 
string  during  an  example  computation.  It  is  also  implied 
that  there  must  be  a  begin  and  an  end  context  for  the 
string.  Since  it  is  possible  to  nave  two  hypotheses  wnere 
one  of  the  context  strings  do  not  agree  in  any  of  tne 
characters,  a  method  must  exist  to  provide  the  appropriate 
context. 

Whenever   tne   comparison   between   context   of   two 

nypotneses   results   in   tne   null   string,  a  disjunction  is 

formed  from  the  characters  immediately  next  to   the   string. 

For  example,  tne  instruction  sequence  given  above  would  give 

the  hypothesis: 

Hypothesis  4: 

Begin  context:  Did  you  run  out  of<sp> 
String:  time 
End  context:  7 

A   comparison   between  hypothesis  3  and  hypotnesis  4 

would  result  in  tne  null  string  for  the  end   context.   Since 

there   must   be  an  end  context,  the  disjuction  of  <sp>  and  ? 
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is  formed  and  tills  become  tne  end  context  for  trie  new 
nypotnesis.  Generalization  tecnniques  tnat  were  mentioned  in 
tne  section  on  context  free  environment  are  tnen  applied  in 
an  attempt  to  reduce  tne  end  context  to  tne  most  general 
context  consistent  witn  tne  data  seen.  Tne  only  alteration 
in  tne  generalization  scneme  is  tne  lowering  of  tne 
tnreshold  values  for  important  sets.  In  tnis  example,  tne 
threshold  value  for  the  punctuation  set  would  be  lowered  to 
1  and  the  ending  context  would  become  [  x|  x=space  or  x  € 
{Punctuation}}  . 

The  final  problem  to  be  solved  is  tne  recognition  of 
variations  in  a  string.  Examples  of  variations  of  a  string 
are,  'Time'  and  'time'»  or  'enclosure'  and  'inclosure'.  As 
mentioned,  if  tne  user  intends  to  capitalize  all  occurrences 
of  'time',  'Time'  is  to  be  included.  Note  these  variations 
of  tne  string  become  tne  compound  labels  for  tne  arcs  in 
Figure  35.  The  system  includes  a  rule  that  enables  the 
recognition  of  variations  of  strings  provided  tne  user  gives 
an  example  of  the  variation.  The  rule  simply  states  tnat  tne 
string  length  will  be  estabiisned  to  be  as  long  as  tne 
longest  string  encountered  during  processing.  Again,  using 
the  example,  the  hypothesis  for  'Time  the  runner.'  would  be: 

Hypothesis  5: 

Begin  context:  ...  T 

String:  ime 

End  context:  <sp>tne  runner. 

It   has   been   estabiisned  by  preceding  user  actions 

tnat  tne  string  length  for  tne  nypotnesis  snould   be   4.   fiy 
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matching  the   pattern   in  nypothesis  5  with  tne  string  from 

nypotnesis  4   it   can   be   determined   tnat   tne   string   in 

Hypothesis   b  snould  be  expanded  by  inserting  a  'T'  in  front 

of  the  string.  Anotner  nypotnesis  is   tnen   generated   vnere 

tne  string  will  be  tne  disjuction  between  tne  strings  'time' 

and  'Tine'.  Tne  final  nypotnesis  from  tne  example  would  tnen 

be: 

Hypotnesis  b: 

Begin  context:  <sp> 

String:  'time'  v  'Time' 

End  context:  1  x|  x  =  space  or  x  e  Punc.} 

Once   tnis  nypotnesis  nas  teen  generated,  it  is  tnen 

used  to  examine  tne  input  for  negative   examples   tnat   can 

strengtnen   or  weaken   tne  nypotnesis.   Suppose   tne  input 

contained  the  fragment  "...  timely  results..."  .   Processing 

tne   input  witn  Hypotnesis  6  would   snow  a  matcn  for  tne 

string,  but  tne  eni  context  would  not  agree;  tnerefore,   tne 

nypotnesis  will  be  strengthened  by  changing  tne  end  context 

as  snown  below: 

Final  Hypotnesis: 

Begin  context:  <sp> 
String:  'time'  or  'Time' 
End  context:  txjx=space  v 

x  e  Punc.  5. 

x  e  small  letters) 

After  the  input  has  been  processed  and  a  final 
hypothesis  proposed,  the  hypotnesis  is  used  to  construct  a 
procedure  sucn  as  snown  in  Figure  35.  Tne  first  part  of  tne 
procedure  to  be  constructed  is  the  transitions  for  the 
beginning  context.  Tne   states   in   tne  procedure  are   tne 


107 


instructions  in  the  instruction  set,  and  tne  arc  labels 
consist  of  tne  information  in  tne  final  nypotnesis.  A  start 
state  is  placed  in  tne  procedure  witn  an  arc  to  a  move  rigtit 
instruction  (R).  Since  tne  procedure  is  a  string  matcn  or 
looK-anead  routine  all  states  otner  tnan  tne  start  state 
will  be  move  right  instructions.  Eacn  of  tne  states  will 
nave  two  arcs  exiting  tnem.  Tne  labels  on  tnese  two  arcs 
will  be  tne  negation  of  tne  eacn  otner. 

Tne  construction  is  accomplisned  by  placing  tne 
first  character  of  the  begin  context  on  tne  exiting  arc 
going  to  a  new  move  right  state.  The  other  arc  is  labeled 
with  tne  negation  of  tne  character  and  tnis  arc  terminates 
at  the  first  move  right  state.  Each  character  of  the  begin 
context  creates  anotner  move  right  state  labeled  as 
mentioned. 

Tne  string  from  tne  nypotnesis  is  then  used  to 
complete  tne  procedure  that  has  been  partially  constructed. 
If  the  string  is  composed  of  disjunctions,  the  cnaracters 
are  used  to  form  disjunctions.  Each  of  the  disjunctions  are 
combined  witn  conjunctions.  Tne  final  nypotnesis  above 
provides  a  string  of  'time'  or  'Time'.  Tne  conjunction  of 
disjunctions  will  be  formed  as: 

('T'  v  't')  S,  ('i'  v  '1')  *  Cm'  v  'm')  S,  ('e'  v  'e') 
Upon  reduction  the  string  will  be  expressed  as: 

('T'  v  't')  &  'i'  S,  'm'  &  'e' 
Each   disjunction   becomes   a   label  on  an  arc  to  a  new  move 
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rignt  state  ani  tne  negation  becomes  tne   label   on   an   arc 
back  to  tne  original  move  rignt  state. 

Finally,  tne  end  context  is  added  in  tne  same  manner 
as  tne  begin  context.  Tne  first  cnaracter  Decomes  tne  label 
on  tne  last  move  rignt  state  created  from  tne  string  and  new 
states  are  aided  for  eacn  cnaracter  in  tne  end  context.  Tne 
result  of  tnese  operations  is  displayed  in  Figure  35. 
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IV.   CONCLUSIONS  AND  RECOM^ENCAT  IONS 

A.   SYNTHESIZER 

Tne  syntnesizer  mat  nas  been  implementea  for  tnis 
tries  is  will  produce  programs  from  example  computations  in  a 
reasonable  amount  of  time.  Tne  system  response  for  most  of 
tne  traces  was  within  10  seconds  on  a  Digital  Equipment 
Corporation  PDP-11/50  minicomputer.  Tne  response  time  is  a 
function  of  tne  length  of  tne  trace  and  tne  mincer  of 
multiple  occurrences  of  a  particular  instruction  or  set  of 
instructions  in  tne  final  algorithm,  witn  multiple 
occurrences  of  an  instruction  affecting  response  time  tne 
most.  As  Biermann  [17J  nas  noted,  tnis  nas  a  nice 
implication  for  programming  by  example  because  most 
algorithms  do  not  exnibit  tne  cnaracterist ic  of  having  a 
large  number  of  instances  of  tne  same  instruction.  In  other 
words,  almost  all  multiple  occurrences  of  an  instruction  in 
an  input  trace  are  indicative  of  a  loop  in  the  algoritnm. 

In  all  of  the  test  cases  except  tnose  mat  required  a 
large  amount  of  backups,  static  processing  accounted  for  at 
least  half  of  tne  total  response  time.  Future  modifications 
to  tne  syntnesizer  wnicn  would  decrease  tne  total  response 
time  could  be  directed  toward  designing  the  static 
processing  stage  more  efficiently.  However,  tne  trade-off 
between   static   processing   and   dynamic  processing  must  be 
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*ept  in  perspective.  Static  processing  is  a  linear  function 
of  tne  length  of  t tie  trace,  whereas  dynamic  processing, 
since  it  is  an  enumerative  searcn  tecnnique,  is  an 
exponential  function  of  tne  length  of  tne  trace. 

Another  area  which  should  be  considered  is  tne  dynamic 
processing  stage.  Tnere  exists  a  pietnora  of  research 
auestions  within  tnis  area.  Tne  primary  one  being:  Can  more 
information  be  gleaned  from  tne  input  trace  during  static 
processing  wnicn  will  decrease  tne  searcn  time  for  dynamic 
processing?  Difference  sets  and  couple-classes  provide  scne 
powerful  mecnanisms  for  decreasing  tne  amount  of  searcn; 
however,  lower  bounds  computations  on  the  number  of  states 
required  by  tne  macnine  often  increase  tne  amount  of  searcn. 
Lower  bounds  are  restrictive  in  nature.  They  are  designed  to 
force  tne  final  algorithm  into  a  minimum  state  configuration 
which,  in  many  cases,  causes  extra  searcn  time.  Relaxation 
of  the  lower  bounds  ccmputation  will  result  in  a  final 
algoritnm  wnicn  may  not  be  expressed  in  a  minimum  number  of 
states,  but  which  will  still  oe  deterministic.  There  mie-nt 
be  better  methods  of  initially  computing  the  r.unber  of 
states  which  would  result  in  a  closer  estimate  of  tne  actual 
number  of  states  required  for  tne  algorithm.  Obviously,  tne 
closer  tne  initial  guess  is  to  tne  actual  requirement,  tne 
less  backup  incurred,  and,  therefore,  the  less  search  time 
required. 
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Since  the  amount  of  search  required  is  governed.  by  tne 
failure  memory  entries,  the  more  dense  tne  failure  memory 
can  be  nade,  tne  more  directed  tne  searcn  be^oTies.  So 
anotner  area  for  researcn  is  to  determine  if  more 
information  exists  in  tne  failure  memory  entries  tnan  is 
currently  feeing  used.  How  tiucq  information  do  tne  structure 
factor  and  the  free  state  factor  provide?  Is  there  another 
factor  wnicn  would  be  useful? 

Finally,  a  more  general  question  can  oe  addressed.  Tne 
underlying  structure  of  tnis  technique  is  an  enumerative 
search.  Can  the  technique  be  generalized  to  include  otner 
algorithms  wnicn  are  enumerative  in  nature?  What 
modifications  to  the  failure  memory  are  needed?  Row  would 
difference  sets  and  couple-classes  be  redefined? 

B.   CONDITION  PROCESSING 

The  condition  processor  front-end  to  the  synthesizer 
relieves  tne  user  from  worrying  about  some  of  tne  control 
structure  considerations  by  automatically  generating 
conditions.  Anotner  addition  which  would  increase  tne  power 
of  the  syntnesizer  is  an  automatic  loop  variable  generator 
as  discussed  by  Biermann  [18]  .  Altnougn  the  text  editing 
environment  nas  been  used  in  tnis  tnesis  wort,  tne  part  of 
the  condition  processor  design  which  deals  with  a  context 
free  environment  is  general  enougn  that  it  could  be  designed 
to  operate  in  any  domain. 


112 


Condition  generation  in  a  context  sensitive  ervircnm°r.t 
is  a  mum  harder  problem  further  complicated  by  requisite 
pattern  matching  ana  pattern  generation.  Before  tnis  type  of 
condition  veneration  can  be  generalized,  mucn  wort  nas  to  be 
done  to  increase  the  efficiency  of  pattern  veneration 
scnemes.  Angluin  [19J  nas  snown  a  pattern  generation  scneme 
which  is  a  polynomial  time  algorithm  for  pattern  veneration 
with  one  variable,  but  tne  domain  we  nave  examined  will 
require  at  least  two  variables.  There  is  not  a  polynomial 
tine  algorithm  for  pattern  generation  with  two  variables. 
Heuristic  techniques  will  probably  be  necessary  to  provide 
methods  of  pattern  generation  which  will  be  fast  enouvh  to 
be  useful  over  a  wide  range  of  problems. 
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